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A  Nonparameteric  Multidimensional  IRT  Approach  with  Applications 
to  Ability  Estimation  and  Test  Bias 

Introduction .  The  central  thesis  of  this  paper  is  that  a  successful  approach  to 
such  fundamental  topics  as  bias,  consistent  estimation  of  the  ability  intended 
to  be  measured,  and  item  calibration  requires  a  nonparametric  multidimensional 
item  response  theory  (IRT)  modeling  approach  with  an  infinite  item  pool  assumed. 
Until  recently,  most  theoretical  and  applied  IRT  based  research  has  uncritically 
assumed  one  of  a  small  set  of  unidimensional,  locally  independent  monotone 
parametric  models;  e.g.,  one-,  two-,  or  three-parameter  logistic  and  normal 
ogive  models  for  a  fixed  finite  number  of  items.  (See  Lord  [1980]  for  a  survey 
of  this  IRT  modeling  research  tradition  and  Mislevy  (1987)  for  a  survey  of 
current  IRT  modeling  research.) 

By  contrast,  this  paper  makes  a  determined  case  for  the  use  of  a  non¬ 
parametric  multidimensional  monotonic  IRT  modeling  framework  with  local 
independence  replaced  by  a  less  restrictive  and,  we  claim,  psychometrical ly  more 
appropriate  assumption,  namely  essential  independence.  In  the  spirit  of  factor 
analysis,  essential  independence  together  with  essential  dimensionality  provide 
a  conceptual  basis  for  establishing  the  number  of  major  latent  dimensions  even 
in  the  presence  of  multiple  minor  dimensions.  Essential  unidimensionality,  the 
existence  of  exactly  one  major  dimension,  provides  a  conceptual  basis  for 
carrying  out  IRT  based  statistical  analyses  that  require  unidimensionality.  It 
is  our  position  that  a  standard  unidimensional  IRT  modeling  approach  should  only 
be  uaed  subsequent  to  a  careful  multivariate  statistical  analysis  of 
unidimensionality  based  on  a  more  general  nonparametric  multidimensional 
approach  like  the  one  herein.  To  use  uncritically  the  standard  unidimensional 
three  parameter  logistic  model  in  applications  is  the  equivalent  of  Plato's  cave 


dweller's  attempt  to  interpret  the  outside  world  entirely  on  the  basis  of 
shadows  cast  on  his  cave  wall. 

Consequences  of  our  more  general  multidimensional  modeling  approach  include 
the  establishment  of  consistent  estimation  of  ability  on  a  common  ability  scale 
even  when  different  examinees  have  taken  different  tests,  and  the  existence  of  a 
"unique'  (appropriately  defined)  latent  ability  provided  essential  unidimension¬ 
ality  holds.  As  a  vital  part  of  our  proposed  multidimensional  IRT  framework, 
the  concept  of  the  intrinsic  ability  scale  of  a  test  is  also  presented. 

Further,  our  approach  leads  to  a  re-examination  of  test  bias  from  a  multidimen¬ 
sional  perspective. 

This  paper  continues  the  work  of  Stout  (1987),  where  essential 
unidimensionality  was  first  defined  and  a  statistical  test  of  essential 
undimensionality  was  presented  and  explored. 

The  paper  is  organized  as  follows:  Section  1  reviews  the  traditional 
multidimensional  IRT  model.  Section  2  defines  essential  dimensionality  and 
studies  some  of  its  basic  properties.  Section  3  considers  the  consistent 
estimation  of  ability  in  the  single-test  single-population  setting.  The 
uniqueness  of  the  latent  ability  is  considered.  Section  4  cautions  against  the 
overreliance  on  i'em  parameter  invariance  and  presents  its  relationship  to 
essential  unidimensionality.  An  IRT  based  definition  of  validity  is  proposed. 
Section  5  proposes  a  new  definition  of  test  bias  and  studies  test  bias  from  a 
multidimensional  modeling  prospective.  Section  6  considers  the  consistent 
estimation  of  ability  using  any  of  a  large  class  of  linear  formula  scores 
including  proportion  correct.  Section  7  considers  the  consistent  estimation  of 
ability  in  multiple-test  multiple-population  settings.  Section  8  briefly 
discusses  and  summarizes  the  results  of  the  paper. 
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1  Multidimensional  Modeling.  According  to  the  latent  trait  viewpoint,  each 
examinee  is  indexed  by  a  possibly  vector  valued  and  not  necessarily  distinct 
variable  JK  Associated  with  each  item  i  is  an  item  response  function  (IRF) 
P.(fl)  that  denotes  the  probability  that  a  randomly  chosen  examinee  from  those 
examinees  with  ability  9  will  get  the  item  right.  Random  sampling  of 
examinees  from  a  specified  population  induces  a  distribution  F  on  ©,  and 
hence,  a  distribution  on  the  test  responses. 


=  (u 


r 


•V 


Here  IL  =  1  denotes  a  correct  response  and  U.  =  0  an  incorrect  response  to 
item  i  by  a  randomly  chosen  examinee.  Note  that  P ^ ( 0 )  =  P[U^  =  1|©  =  8]  = 
E[U.|©  =  0]  for  all  i ,  9 .  It  is  important  to  stress  that  a  model  can  have 
many  latent  model  representations  (U^  .  ©) .  That  is,  there  are  many  choices  of 
©  such  that,  for  all  u^, 

GO  00 

(1.1)  Pty^  =  ^1=1  •••  |  P[yN  =  uNl©  =  e]dF(0) 


Three  characteristics  of  latent  representations  are  of  considerable 
importance : 

(i)  The  model  ( U,.  ,  ©)  is  said  to  be  a  monotone  model  if  P,(0)  is 

nondecreasing  in  _9_  for  each  i  (here  0^  <  9 2  if  and  only  if  0^  <  ©2. 
for  each  coordinate  i).  M  will  denote  such  a  monotone  model. 

(ii)  The  model  (U^  ,  ©J  is  said  to  be  d-dimensional  if  8  is  a  d-dimen¬ 

sional  random  vector.  The  d  dimensional  ability  is  then  denoted 

by  (0  .•••.0(j).  The  dimensionality  of  ©^  will  be  denoted  by 
dim( ©)  or  d . 

(iii)  The  model  ( ,  ©J  is  said  to  be  a  locally  independent  (LI)  model  if 

N 

(1.2)  P(U  =  u  ■ • • .  UN  =  u  I©  =  u  =  TT  P[U.  =  u. I©  =_0J 

i  =  l 
N 

for  all  9  and  each  of  the  2  choices  of  (u  .•••u..). 

—  -  IN 
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The  most  commonly  used  class  of  models  has  been  the  LI,  M,  d  =  1  models. 


Usually  for  models  when  M,  d  =  1  holds,  the  IRFs  will  be  strictly  monotone. 

2.  A  New  Conceptualization  of  Test  Dimensionality.  For  later  use,  we  first 

establish  that  an  LI,  M  latent  model  with  latent  dimensionality  <  N  is  always 

possible  for  a  given  length  N  test  U.  . 

“ N 

Theorem  2.1.  For  any  test  U„  there  exists  an  LI,  M  latent  model 

_  —  - 

representation  (U^  ,  _©J  with  dim  (9)  <  N. 

Proof  of  Theorem  2.1.  Assume  that  the  test  UN  has  a  particular  distribution. 
Def ine 

(2.1)  ©.  =  U. 

l  l 

for  i  <  N.  Then  the  range  of  each  ©^  is  {0,1}  and  for  all  i, 

P.(8)  s  P[U.  =  lie]  =  e. . 

The  intuitive  interpretation  of  (2.1)  is  that  8  represents  an  examinee's  state 
of  knowledge  of  the  N  items,  in  the  sense  that  8.  =  1  or  0  depending  on 
whether  the  examinee  knows  the  answer  to  item  i  or  not.  Clearly,  monotonicity 
holds  because  P^S)  is  monotone  in  8  for  each  i.  In  order  to  verify  LI, 
note  that  for  all  u,  8  that 

fl  if  u  =  8 

(2.2)  P(UN  =  u|©  =  8]  =  P[UN  =  u|UN  =  8]  =  J 

|^0  otherwise 

and  that  (letting  0°  =  1  for  convenience) 

N  u.  1-u.  N  u.  1-u.  Jl  if  u  =  8 

TT  P  (8)  1  [1  -  P  («)]  1  =  TT  «/(!  -  9  )  1  =  J  _  • 

i=l  i=l  ^0  otherwise 

Thus  LI  holds.  Finally,  it  must  be  verified  that  ( .  ©)  is  a  latent 

representatic  1  in  the  sense  that  (1.1)  holds:  Note  that  (2.1)  implies  that 

=  =  pfyN  =  •  Thus,  applying  (2.2)  to  the  right  hand  side  of  (1.1) 

with  u  -  u.,  shows  that  (1.1)  holds.  □ 

-N 


This  mathematically  trivial  theorem  provides  a  certain  insight:  For  a  test 


of  length  N,  the  assumption  of  an  LI,M  model  is  a  totally  nonrestrictive 
assumption  provided  is  allowed  to  have  an  N  dimensional  distribution. 

Thus,  given  a  sequence  of  tests  (U^,  N  >  1}  .  each  can  be  given  an  LI ,M 

representation  ( U. .  ,_©..)  such  that  dim^,)  <  N.  This  fact  will  be  used  below. 

iN  IN  — N 

It  is  important  to  note  that,  although  mathematically  helpful,  the  latent 
trait  9  of  Theorem  2.1  is  totally  uninteresting  from  the  modeling  viewpoint. 
For.  unlike  the  9  of  Theorem  2.1,  a  valid  latent  trait  should  surely  be 
something  more  abstract  and  indeed  more  "latent"  than  an  examinee's  state  of 
knowledge  concerning  a  particular  finite  set  of  items.  Surely  no  meaningful 
cognitive  construct  (e.g.,  reading  comprehension)  is  reducible  to  which  of  a 
finite  set  of  items  an  examinee  can  correctly  answer.  Further,  it  is  clearly 
inappropriate  from  a  modeling  viewpoint  for  the  dimensionality  of  the  test  to 
equal  the  test  length. 

The  9  of  Theorem  2.1  is  uninteresting  from  another  viewpoint  as  well. 

Holland  and  Rosenbaum  (1987)  strongly  make  the  point  that  an  assumption  true  for 

all  models  is  "vacuous"  and  is  neither  a  mathematical  assumption  (because  it  is 

always  satisfied)  nor  a  scientific  hypothesis  (because  it  places  no  testable 

restrictions  on  the  behavior  of  observable  data).  The  9  guaranteed  to  exist 

by  Theorem  2.1  is  clearly  of  this  vacuous  character.  What  is  interesting  is 

that  for  many  tests  11  there  do  exist  lower  dimensional  (than  N)  latent  trait 

~N 

representations.  Indeed,  the  psychometrician's  goal  is  to  construct  a  test  that 
validly  measures  the  construct  of  interest  using  items  sufficiently 
"homogeneous  that  the  test  can  be  well  modeled  by  a  low  dimensional  or 
hopefully  even  unidimensional  model. 

Let  us  recall  the  traditional  IRT  definition  of  test  dimensionality: 

Def  in  i  t  i  on  2.1  The  dimensionality  d  of  a  test  U  is  the  minimal 


dimensionality  required  for  ©_  to  make  the  latent  model  representation  (11  ,  ©) 

-N 

an  LI,M  representation. 

Although  mathematically  appealing,  this  definition  is  rather  impractical 
for  mental  testing  because,  in  actual  practice,  individual  test  items  clearly 
have  multiple  determinants  of  their  respective  probabilities  of  correct 
response.  This  position  has  been  pursued  clearly  and  vigorously  by  Humphreys 
(1984),  who  states: 

The  related  problems  of  dimensionality  and  bias  of  items  are 
being  approached  in  an  arbitrary  and  over-simplified  fashion.  It 
should  be  obvious  that  unidimensionality  can  only  be  approxi¬ 
mated.  Even  in  highly  homogeneous  tests  the  mean  correlation 
between  paired  items  is  quite  small.  The  large  amount  of  unique 
variance  in  items  is  not  random  error,  although  it  can  be  called 
error  from  the  point  of  view  of  the  attribute  that  one  is 
attempting  to  measure.  Test  theory  must  cope  with  these  small 
correlations.  We  start  with  the  assumption  that  responses  to 
items  have  many  causes  or  determinants. 

Humphreys  (1984)  asserts  that  dominant  attributes  (dimensions)  result  from 
overlapping  attributes  common  to  many  items.  Attributes  unique  to  individual 
items  or  common  to  relatively  few  items  are  unavoidable  and  indeed  are  not 
detrimental  to  the  measurement  of  dominant  dimensions.  In  his  writings, 
Humphreys  stresses  that  the  low  item  intercorrelations  researchers  have  observed 
argue  strongly  for  viewing  items  as  multiply  determined.  Although  the  existence 
of  multiply  determined  items  is  rarely  stressed  in  the  IRT  literature,  it  is  a 
theme  with  a  long  history  in  the  factor  analytic  test  theory  literature. 
Classical  factor  analysis  applied  to  binary  test  data  of  course  implicitly 
assumes  the  possibility  of  many  determinants,  allowing  for  many  determinants 
specific  to  individual  items  in  addition  to  one  or  more  dominant  dimensions. 
McDonald  (1981)  actually  argues  for  the  existence  of  "minor  components"  in 
factor  analytic  modeling  of  test  data.  That  is,  he  argues  for  the  existence  of 


multiple  determinants,  many  of  which  are  common  to  relatively  few  items  at  most. 
Tucker,  Koopman,  and  Linn  (1969)  have  developed  a  factor  analytic  test 
simulation  model  that  includes  "minor  factors"  as  well  as  dominant  factor  and 
unique  factors.  Tucker  has  specialized  this  model  to  binary  item  tests  in  work 
yet  to  be  published. 

Unfortunately,  the  traditional  definition  (Definition  2.1),  with  its 
insistence  on  the  achievement  of  local  independence,  makes  no  distinction 
between  dominant  and  minor  factors.  Thus,  if  taken  seriously,  this  definition 
compels  us  to  take  as  test  dimensionality  the  total  number  of  all  item 
dimensions  rather  than  adopting  the  more  appropriate  "factor  analytic  viewpoint" 
by  which  only  the  number  of  dominant  dimensions  is  counted.  This  is  true  even 
in  situations  with  only  one  dominant  dimension  where,  from  both  a  psychometric 
and  a  data  analytic  viewpoint,  it  would  be  desirable  to  ignore  multiple 
determinants  (i.e.,  minor  and  unique  factors)  and  categorize  tests  as  unidimen¬ 
sional.  Thus  the  traditional  definition  requires  us  to  assign  dimensionality 
d  =  dQ  >  1  (dQ,  thus  assigned,  possibly  quite  large  in  fact)  in  settings  where 
it  would  be  desirable  to  assign  d  =  1.  The  following  hypothetical  example 
illustrates  the  multidimensional  nature  of  items  in  tests  that  should  be 
considered  unidimensional. 

Example .  Consider  a  multiple  item  "probability"  test  in  which  item  1  measures 
ability  in  probability  but  is  influenced  by  the  examinees'  knowledge  of  an 
ordinary  deck  of  playing  cards,  item  2  measures  ability  in  probability  but  is 
influenced  by  the  examinees'  understanding  of  elementary  physics,  item  3 
measures  ability  in  probability  but  is  influenced  by  the  examinees'  knowledge  of 
elementary  Mendelian  genetics,  item  4  measures  ability  in  probability,  but... 

One  is  clearly  forced  to  label  such  a  test  as  multidimensional  according  to 
the  traditional  conceptualization  of  dimensionality  described  above.  Indeed,  it 


is  clear  that  d  >  3:  with  the  dimensions  including  ability  in  probability 
the  context  of  bridge  knowledge,  ability  in  probability  in  the  context  of 


elementary  physics  knowledge,  and  ability  in  probability  in  the  context  of 
knowledge  of  genetics. 

The  mu  1 1 icontextual  nature  of  this  example  is  deliberate.  It  seems 
undesirable  to  construct,  perhaps  under  the  guise  of  eliminating  biased  ite 
context  free  probability  test  (even  if  possible),  for  it  would  probably  not 
measure  what  should  be  measured,  namely  the  ability  to  solve  probability 
problems  in  a  variety  of  contexts.  Hence,  whether  the  multiple  determinant 
prominent  as  in  the  above  example  or  more  subtle,  a  probability  exam  would 
necessity  comprise  multiply  determined  items.  Moreover,  testing  that  compr 
multiply  determined  items  is  necessarily  widespread  and  is  in  no  way  restri 
only  to  tests  in  probability.  Clearly,  it  would  be  useful  to  have  a  concep 
test  dimensionality  that  would  allow  such  tests  as  the  above  to  be  consider 
unidimensional.  Such  a  conceptualization  is  provided  by  the  essential 
dimensionality  of  a  set  of  items,  defined  below.  This  definition  is  design 
count  the  number  of  dominant  dimensions  only,  uninflated  by  the  incidental 
multidimensionality  of  items. 

In  order  to  present  our  definition  of  essential  unidimensionality  and 
study  the  asymptotic  theory  of  ability  estimation,  it  is  necessary  to  view 
test  as  embedded  in  a  sequence  of  tests  •••.  each  obtainec 

the  previous  one  by  the  addition  of  one  more  item.  Two  justifications  for 
realism  of  this  shift  in  modeling  perspective  can  be  given:  (1)  If  ar.  acti 
Item  Banking  scheme  is  being  used  to  construct  the  test,  then  our  embeddinj 
scheme  is  totally  realistic.  Indeed,  random  sampling  of  items  is  commonly 
for  criterion  referenced  tests  constructed  from  item  banks,  according  to 
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Hambleton  and  Swamanathan  (1985,  Chapter  12).  (2)  Even  when  there  is  no  actual 

sampling  of  items  from  a  population  of  items,  certainly  can  be  and  we  think 

should  be  viewed  as  a  representative  sample  of  the  infinite  population  that 

would  be  constructed  by  continuing  to  generate  items  by  whatever  test 

construction  process  has  been  used  to  generate  the  first  N  items  U„. 

-N 

It  will  be  assumed  throughout  the  remainder  of  the  paper  that  UN  is 

embedded  in  a  sequence  U  , -  * • , . *  -  * .  This  will  be  referred  to  as  the  item 

pool  formulation  of  IRT.  This  embedding  is  analogous  to  the  mathematical 

statistician's  study  of  the  estimation  of  a  population  mean,  say,  by  a  sequence 

N 

of  estimators  XN  =  Z^XVN  resulting  from  a  sequence  of  random  samples,  each  of 
which  is  obtained  from  the  preceeding  one  by  the  selection  of  one  more 
observation.  As  with  Justification  (2)  above,  such  a  sampling  model  is  often 
used  when  the  "population"  being  "sampled"  from  is  only  conceptual  rather  than 
an  actual  population. 


We  now  define  a  weaker  type  of  independence  than  local  independence. 

Definition  2.2.  The  latent  model  {U..,  ©,  N  >  1}  is  said  to  be  essentially 

-N  -  - 

independent  (El)  if  the  conditional  distribution  of  UN  given  ©  in  (1.1) 


satisfies  for  each  9  in  the  range  of  ©, 


(2.3) 


V8-1  5 


Il<l<jSs'Co''IUi'Ujl  ?  '  8-" 

'  ffi 


0  as  N  -»  «. 


Remarks.  Mathematically,  the  notation  (U...  ©,  N  >  1}  in  Definition  2.2  means 
-  -N 

that  the  U„  are  random  vectors  and  ©  a  random  vector  defined  on  a  common 
-N 

probability  space.  This  corresponds  to  the  intuitive  notion  of  an  infinite  item 


pool  and  a  fixed  examinee  population  from  which  the  sample  is  drawn.  In  this 
paper,  issues  of  rigor  when  using  measure-theoretic  probability,  although  always 
surmountable,  are  suppressed  in  the  interest  of  clarity. 


_r 


It  is  informative  to  contrast  the  definition  of  essential  independence  with 


the  traditional  latent  trait  conceptualization  of  local  independence  given  in 
(1.2).  LI  implies  pairwise  independence  of  all  pairs  (IK, IK),  i  *  j,  given 
9.  which  is  equivalent  to  cov(U.,lK|©  =  9)  =  0  for  all  9,  i  *  j.  By 
contrast,  El  only  requires  that  for  each  fixed  9,  cov(U.,lK|©  =  9)  is  small  on 
average  as  the  test  length  N  grows. 


Now  essential  dimensionality  can  be  defined. 


Definition  2.3.  The  essential  dimensionality  dE  of  a  family  of  tests  (UN) 
is  the  minimal  dimensionality  required  for  a  latent  trait  ©  to  make  the  latent 


model  representation  {U., ,  ©,  N  >  1}  an  El,  M  representation.  When 

-N  - 

dE  =  1 ,  essential  unidimensionality  is  said  to  hold.  If  essential  d£ 
dimensionality  holds  using  ability  ©.  then  (UN)  is  said  to  be  essentially  d£ 
dimensional  with  respect  to  ability  ©.  An  essential  trait  ©  is  any  latent 
trait  ©  for  which  {UN>  ©.Nil)  is  an  El ,  M  representation  with  the 
essential  dimensionality  of  ©  the  minimum  possible. 

Remarks .  Although  d  =0  is  theoretically  possible,  it  is  psychometrical ly 

Ci 

uninteresting.  Thus,  to  avoid  irrelevant  trivialities  it  is  assumed  that  d£>l 
for  all  latent  representations  considered  in  this  paper. 

The  following  theorem  makes  precise  one  way  that  essential  unidimen¬ 
sionality  might  occur. 

Theorem  2.2.  Suppose  there  is  a  random  variable  ©  such  that  for  each  9 

sup  cov(U  .,U.|©=9)  -*0  as  N-»®. 

| i- j |  >  N  1  J  1 

Then  essential  unidimensionality  holds. 

Proof .  Fix  9.  Fix  e.  >  0.  Choose  NQ  such  that 

sup  cov(U. ,  U .  |©  =  9)  <  e. 

I  i-J !  *  V  1  J  1 


Then,  for  N  >  N„ , 
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:i<i<j<N|cov(ur  Ve  =  «>|  "  4  N  +  N  V 


Then  0^(8)  <  3e  for  N  large,  thus  establishing  essential  unidimensionality,  a 


The  following  example  illustrates  the  difference  between  essential 
dimensionality  (Definition  2.3)  and  the  traditional  definition  (Definition  2.1) 
of  dimensionality. 

Example  2.1.  Consider  the  construction  of  a  paragragh  comprehension  test  of 
length  N  =  5n .  where  n  =  number  of  paragraphs  and  each  paragraph  is  followed 
by  five  related  questions.  Assume  total  independence  between  questions 
involving  different  paragraphs  given  ©,  where  for  convenience  we  think  of  © 
as  reading  ability.  Suppose  that  cov(U.,  IK|©  =  8)  >  0  for  all  IK,  IK  for 
the  same  paragraph  (as  should  be  the  case)  and  that  the  IRFs  are  monotone. 

Note,  using  |cov(lK,  U  ^  |  O  =  6  )  |  <  1  for  all  i,  j,  8, 


5  1 

n 

2  J 

<  const 

’5n  ' 

N-l 

2 

0  as  N  — »  ®. 


Thus  essential  unidimensionality  holds,  whereas  a  traditional  dimensionality  of 
n  ♦  1  seems  necessary  for  a  test  of  length  N  =  5n.  Reading  ability  (©)  is  the 
essentially  trait  for  this  essentially  undimensional  model.  □ 

The  example  illustrates  our  view  that  minor  or  idiosyncratic  dimensions 
should  be  ignored  in  assessing  test  dimensionality  from  the  applications 
viewpoint.  Our  requiring  El  rather  than  LI  is  the  key  step  that  makes  it 
possible  to  ignore  minor  dimensions  in  assessing  dimensionality. 

Example  2.1  suggests  an  interesting  sufficient  condition  for  essential 
unidimensionality:  If  there  is  one  trait  common  to  many  items,  if  the  other 

traits  are  "orthogonal"  to  one  another  given  this  "dominant"  trait,  and  if  each 


of  the  other  traits  influences  only  a  bounded  finite  number  of  items,  then 


essential  unidimensional ity  should  hold. 


Theorem  2.3.  Let  U  =  {lh,  i  >  1}  be  an  item  pool  with  U  partitioned  as 


u  =  <um.  u(2> 


is 


).  Suppose  that  the  number  of  items  in  each  U 
uniformly  bounded  in  i  and  that  local  independence  holds  with  respect  to 
9  =  ( 9 ,  0  ,  ^ ,  -  -  - )  .  Suppose  that  the  item  response  functions  for  each  item  of 
l'*1*  depend  on  9,9.  only,  and  that  ©.  and  ©.,  are  conditionally 

l  l  l  ‘ 

independent  given  ©  for  all  i  *  i‘.  Suppose  for  all  i  that 

def  f. 


(2.4) 


Pj  (©) 


p. .  {©.  9. )  dP(e. |e) 

J  ji  i  i ' 


is  monotone  nondecreasing  in  0,  where  Pj.(0,  0.)  is  the  item  response 

function  of  the  jth  item  of  U.  and  P(0.|0)  is  the  conditional  distribution  of 

©  given  ©  =  0.  Then  (U...  N  >  1}  is  essentially  unidimensional  with  respect 
1  -N 

to  ability  ©. 

Remarks .  In  the  statement  of  Theorem  2.3  above,  ©  is  the  essential  trait.  The 
hypotheses  of  Theorem  2  3  can  easily  be  modified  to  allow  each  individual  item 
to  depend  on  more  than  one  nuisance  parameter,  or  to  allow  the  number  of  items 
in  each  category  to  grow  slowly. 

Proof  of  Theorem  2.3.  Choose  IK  U ^ ,  from  different  partitions  -  say,  i  and 
i'.  Then,  denoting  the  joint  density  of  ©.  and  ©^,,  given  ©  =  0  by 
f<0.  0  4  ,  1 0  )  , 

r"  r“ 

E(U  U  I©  =  0)  =  (  P..(0,  0.)  P.^.f©.  8.,)  f(0.,©;|0)  d  0  d  0 

JJ  I  l  J  1  1  J  1  1  11  11 

j  -00 

oo  r  co 

=  f  pj ■ i ■ - >  f  pji{®’  V  f(eiie)  d  ®i 

*  — CD  I  —CO 


f(0. , |0)  d  ©.  , 


=  E[Uj |©  =  0)  E [ U  j , j  ©  =  0] . 

Thus,  given  ©.  IK  and  IK,  are  therefore  conditionally  independent.  Thus 

cov ( U  .  ,  LK  ,  1 9  =  0)  =  0.  Now  let  the  number  of  items  K.  in  each  partition  set 
J  J  1 

LI*1*  be  bounded  by  K.  Then.  0^(0)  satisfies  for  all  0 
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where  nN  =  the  number  of  partitions  into  which  the  first  N  items  are  split. 


ius,  noting  that  ;  K.  =  N, 


for  all  9 


I  IIU  W  i  Ut,  VJIU  V  /  t  *-  1  U  ^  1  w 

ii.l  1 

fN  K2 

_  ...  ,  ^i=l  1  .  KN  ^  K 

“  N(N-l)  “  nTn-1)  ~  N-l  *  0  as  N  ►  ». 

which  establishes  the  result  since  M  holds  for  the  pj(®)  of  (2.4)  by  hypothesis. 


3.  Application  to  Consistent  Estimation  of  Ability.  We  turn  now  to  the  problem 
of  estimating  a  particular  latent  ability  9  in  the  presence  of  other 
(  "nuisance" )  abilities.  In  order  to  illuminate  certain  theoretical  issues  most 
clearly,  ability  estimation  will  be  considered  in  its  simplest  setting:  we 
consider  a  fixed  test  for  a  single  examinee  population  with  no  consideration  of 
scaling/equating  issues  such  as  the  need  to  find  a  common  ability  scale  when 
using  more  than  one  test.  Later  in  Sections  4,  6,  and  7  we  will  address  some  of 
the  practical  problems  raised  when  the  rather  strict  one-test  one-population 
assumptions  are  relaxed. 

We  suppose  that  (U  .  8,  N  >  1)  is  either  essential  d  dimensional  or 
traditional  d  dimensional  with  respect  to  9  for  some  d£  >  1  or  d  >  1 
respectively.  The  item  response  functions  for  {U^,  ©,  N  >  1}  are  denoted  by 
P.(0)  Let  9  be  the  ability  desired  to  be  estimated,  and  suppose  that  9 
determines  9;  i.e.,  that  9  is  a  function  of  9. 

In  this  single  population  problem  there  is  nothing  unique  or  preferable 
about  the  9  scale.  That  is,  any  strictly  increasing  transformation  A ( © ) 
yields  an  equally  acceptable  scale  for  purposes  of  estimating  9,  Let 


(3.1) 


P^S)  =  E  [  P  A  ( d )  |  ©  =  0]  =  P[Uj  =  1|S  =  0] 


(the  distinction  between  P.(0)  and  P ^ ( 9 )  henceforth  assumed  clear  from 
context).  The  P.(0)s  are  called  the  marginal  item  response  functions  with 
respect  to  ability  0.  Let 

(3.2)  A.,(0)  -  yN  P  .  ( 0  )  / N 

N  ^i=l  1 

A^,(0)  is  called  the  intrinsic  ability  scale  for  0  relative  to  the  test  UN 
and  to  the  examinee  population  ©.  A^(0)  has  an  interpretation  bridging 

classical  test  theory  and  IRT:  A^,(0)  is  t^ie  expected  test  score,  that  is,  tru< 
score,  among  all  examinees  with  latent  ability  0.  Under  the  assumption  of 
strict  monotonicity.  Theorem  3.1  below  implies  that  A^fO)  is  strictly 
increasing  in  0  and  hence  is  an  acceptable  scale  for  estimating  0. 

Considerable  recent  attention  has  been  focused  on  nonmonotone  unidimen¬ 
sional  item  response  functions.  It  has  been  shown  that  attractive  distractors 
are  a  source  of  nonmonotonicity.  It  has  been  suggested  that  the  existence  of 
attractive  distractors  may  be  explainable  by  multidimensionality  of  the  ability 
space.  In  this  regard,  it  is  interesting  to  note  that  P ^ ( © )  can  be  monotone 
and  yet  P.10)  nonmonotone: 

Example  3.1.  Let  P(0  ,  0  )  =  (0  *  0  ) / 1 7 ,  1/4  <  0  <  1.  0  <  0  <  16,  and 

lb  lb  1  b 

f ( 0  2  1 0  i )  =  ®j/4  if  0  <  <  4/0  :  =  0  otherwise. 


,4/0,  ,0.  ♦  0 , 


P,V  ■  J0  [-W-]  *i 

.  f!l  .  L.1  J_ 

U  9,)  1? 


1/4  <  0j  <  1 


But  P ( 0  )  is  decreasing  in  for  all  0^.  Q 

Of  course,  as  is  intuitively  clear,  mild  and  natural  regularity  conditions 
preclude  this  nonmonotone  behavior.  Indeed,  the  nonmonotonicity  of  a  projected 


item  response  function  can  occur  only  when  the  multidimensional  ability  fl  has 
some  sort  of  negative  association  among  its  components. 

Def init ion  3.1.  A  random  vector  Y  is  said  to  be  stochastically  larger  than  a 
random  vector  X  if,  for  all  t. 

(3.3)  P[X  >  t]  <  P[Y  >  t] 
with  strict  inequality  for  at  least  one  t. 

The  following  fact  is  well  known: 

Lemma  3.1.  Let  Y  be  stochastically  larger  than  X.  and  let  f  be  a  non¬ 
negative  nondecreasing  real  valued  function.  Then 

Ef ( X)  <  Ef  ( Y) 

Theorem  3.1.  Let  (U^.  6)  be  a  monotone  representation.  Let  ©  s  (©,  ©2). 
Suppose  that,  for  every  9^  <  9 2  pair  that  the  distribution  of  ©2  given 
©  =  ©2  is  stochastically  larger  than  the  distribution  of  ©2  given  ©  =  e  . 
Then,  for  i  <  N,  each  marginal  item  response  function  P^e)  of  (3.1)  is 
nondecreasing  in  9. 

Proof .  It  must  be  shown  that 

|p.(e,82)dP(e2|©) 

is  nondecreasing  in  9  where  P ( © 2 1 0 )  denotes  the  distribution  of  ©,>,  given 
9  =  9.  Fix  9‘  <  9"  .  By  Lemma  3.1,  noting  that  P.(9,  9^)  is  for  fixed  9  a 
nondecreasing  function  of  ®2  by  the  assumption  of  monotonicity, 

(3.4)  jV®*'  VdP<!2,®,)  ~  fV®'1  )dP(  ©2  |  ©  '  ’  ) 

But,  because  P.(9)  =  P . ( © ,  9  )  is  nondecreasing  in  9  for  each  fixed  9  , 

lib  ~  w 

(3.5)  )dp(©2  |  ©  *  *  )  <  Ip^©1’.  ©2  )dP(  ©2 1  ©  1  •  ) 

The  combination  of  (3.4)  and  (3.5)  yields  the  desired  result.  □ 

We  now  turn  directly  to  the  ability  estimation  problem.  As  we  shall  see. 
essential  unidimensionality  characterizes  the  consistent  estimation  of  some 
unidimensional  latent  ability:  moreover,  it  implies  that,  in  a  certain  sense. 


the  latent  ability  is  unique.  It  is  in  this  spirit  that  an  essential  trait 

(recall  Definition  2.3)  can  be  referred  to  as  "the"  essential  trait  with  respect 

to  which  the  items  are  essentially  unidimensional. 

Theorem  3.2  below  asserts  that  essential  unidimensionality  is  precisely  the 

condition  needed  for  consistent  estimation  of  ability.  The  main  estimation  tool 

is  proportion  correct  L’  .  Later  in  Sections  6  and  7  generalizations  based  on 

N 

linear  formula  scores"  are  explored. 

Before  stating  Theorem  3.2.  we  must  carefully  consider  what  it  means  to 

consistently  estimate  9  using  an  item  pool  formation.  Recall  our  viewpoint 

that  any  strictly  monotone  transformation  of  8  --  for  example  A.,(8).  which  is 

N 

strictly  monotone  when  the  IRFs  are  --  is  an  acceptable  scale  on  which  to 

estimate  8.  Clearly  0.,  is  a  natural  estimator  of  A „(8). 

N  N 

Let  N(C)  denote  the  cardinality  of  a  finite  set  C.  For  t>0,  let  ^(t) 

denote  the  collection  of  all  subsets  C  of  {1 . N}  such  that  N(C)/N  >  t. . 

For  example,  f.Jl/2)  consists  of  all  subsets  containing  at  least  half  the 
N 

integers  between  1  and  N.  We  will  call  a  sequence  (C^,  N  >  1 }  of  integer 
subsets  nonsparse  provided  there  exists  e  >  0  such  that  for  every 


N.  Let  C  €  Define 

N 


>C  ■  Lee  V 


Definition  3.2.  It  is  said  that  9  may  be  consistently  estimated  (in 

probability)  using  the  sequence  {(J^,  N  >  1)  of  items  if  for  every  nonsparse 

sequence  {C  ,  N  >  1}  there  exists  a  sequence  {g_  (8),  N  >  1}  of  functions  of 
N  CN 

9  such  that  for  each  8.  given  8=8, 


(3.6) 


U  -  g  (8)  -*  0 
N  N 


in  probability  as  N 


W-'-  ^  •>  ■  ■  ""c  -  V-  -V  '■  ‘  V-  V'.  r.  «\  \  '  / 


Remarks  The  intuitive  idea  of  the  definition  is  that  any  nonsparse  subsequence 
of  items  should  be  usable  to  estimate  9  in  the  sense  of  (3.6).  Also,  not  all 


sparse  subsequences  of  items  need^ necessar i 1 y  be  usable  to  estimate  9.  For 

K 

example  if  every  2  th  item  in  a  mathematics"  test  were  a  "verbal"  item,  we 
would  still  be  able  to  consistently  estimate  9  (mathematics)  since  the  bad 


sequence  of  estimators  {  '  U  /M.  Mil)  is  formed  from  too  sparse  a 

LK-l  2k 

subsequence  of  the  items  (l’..  i  i  1).  However,  Definition  3.2  does  for  example 
VM 

require  ^ j o i  M M  ^  t0  estimate  9  in  the  sense  of  (3.6).  Thus,  if 

every  10th  item  were  a  verbal  item,  the  sequence  of  items  (U.,,  Nil)  would 
not  be  able  to  consistently  estimate  9. 

A  reasonable  question  to  ask  is  why  our  definition  of  consistent  estimation 
should  require  (3.6)  rather  than  merely  requiring  the  existence  of  functions 
(gv(0)}  such  that  for  each  given  9 
(3.7)  UN  -  g N(8)  -  0 

in  probability  as  N  -♦  <*>.  One  reason  is  that  if  a  test  is  formed  by  sampling 

items,  as  in  item  banking  or  computerized  adaptive  testing,  then  clearly  9 

must  be  estimable  using  any  reasonable  sequence  C„  in  (3.6).  A  second  reason 

N 

that  requiring  (3.7)  is  inappropriate  is  that  it  is  vacuous  in  the  sense  that 

every  test  of  fixed  length  N  can  be  viewed  as  embedded  in  an  essentially 

d  dimensional  sequence  of  tests  (U  ,  9,  N  >  1}  such  that  (3.7)  holds  for  a 
E  -N 

judicious  choice  of  9.  The  following  example  illustrates  this  embedding  for  a 
test  where  50%  of  the  items  measure  one  trait  and  50%  measure  a  second  :rait. 

It  presents  an  essentially  two  dimensional  family  of  tests  where  (3.7)  i^s 
satisfied  for  a  mathematically  judicious  choice  of  9,  9  being  some  function  of 
the  dimensions  (6  0  ).  However,  (3.6),  which  postulates  consistent  estimation 

of  9  by  a  1 1  nonsparse  subsequences  of  items,  is  seen  to  fail  in  the  sense  that 


-T*-. 


'V.v.v.v 


V. 


two  nonsparse  subsequences  of  items  can  be  selected  that  estimate  0  ,  9 

12 


respectively.  This  is  a  situation  where  most  psychometricians  would  prefer  to 
split  the  test  up  into  two  unidimensional  tests  and  only  then  address  the  issue 
of  consistency.  Requiring  (3.6)  instead  of  (3.7)  as  a  definition  of  consistency 
reflects  these  considerations. 

Example  3.2.  Let  {U2[<  •  be  a  LI  ,  M  family  of  latent  variable  models  with 

9  =  (0  anC*  ^°r  1  —  i  -  K, 

P2i(9>  ‘  V  *  9 2 

where  the  distribution  of  9  is  given  by  ©  .  ©2  independent  identically 
distributed  with  9^  uniformly  distributed  on  [0,1].  Let  ©  =  ©^  +■  ©^.  Fix  0. 
Then,  standard  multivariable  calculus  yields  for  1  <  i.j  <  K,  1  <  k  <  2K 


(3.8) 


and.  if  i  *  j 


E ( U.  I©  =  0 
k 


1-  C0V  ,U2i...U2jle  ‘  91  =  - 


c°v  (  U2.  IJ2J|8  •  «)  • 


C0V  ,U2i-l.U2j  l'®  '  ‘ 


Thus .  using  (3.8), 


Var(C'2K|©  =  0)  = 


:  — —  var<ui'e  - 91 

(2Kr 

y  cov(U, ,U  .  I©  =  0 ) 

L  i  j 

l<i*j<2K 


1  J  f  0  ]  0 2 

2K  2  [  1  2  j  24K  "* 


a  *111 


-»  0  as  K  -»  ®. 


Further,  E  L’  |9  =  0  =0/2.  Thus  P 


'|02K  -  -fl  >  *  I®  ■  9]  *  Var(02|<1/42  -  0 


as  K  -*  00  Hence,  for  each  0,  given  9  =  0, 


in  probability  as  K  -*  ®.  A  similar  analysis  holds  for  U  and  also  for 

cK  1 


1  <  9  <  2.  Thus  (3.7)  does  hold  in  what  is  clearly  an  essentially  two 
dimensional  sequence  of  tests.  However,  it  is  intuitively  clear  that  (3.6) 


fails  since  the  even  items  can  be  used  to  consistently  estimate  0^  and  odd 
items  to  consistently  estimate  0 □ 
Now  Theorem  3.2  can  be  stated  and  proved. 

Theorem  3.2.  Let  {C^}  be  essentially  unidimensional  with  respect  to  ability 
0.  Then.  0  may  be  consistently  estimated.  In  particular,  for  each  given  ©  = 
e,  (3.7)  holds. 

Conversely,  if  for  some  monotone  latent  model  {II  ,  ©}  the  undimensiona!  6 

-N 

may  be  consistently  estimated,  then  (U^,  ©}  is  an  El .  M  representation  and  hence 
essential  unidimensionality  holds 

Proof .  Assume  essential  unidimensionality.  Fix  t  >  0  and  0. 

(3.9)  P[|UN  -  AN ( 0 ) |  >  e | 9  =  0] 

<  Var  (UNI©  =  9)/&2 

since  E[U,J0  =  0]  =  AN ( © ) . 

But.  noting  that 

(3. 10)  Var(U. |©  =  9)  <  N/4, 

^i  =  l  1 


it  foil ows  that 


Var  ( L'N  1 8  =  9)  <L  + 


cov(u . ,u . |e=0) 


0  as  N  -*  ® 


l<i<j<N 

by  the  assumption  of  essential  unidimensionality  with  respect  to  9.  Thus  (3.7) 


follows  with  g  (0)  =  A  ( 0 )  .  Now  consider  any  nonsparse  sequence  {C  .  N  >  1). 
N  N  N 

It  follows  from  essential  unidimensionality  that 


1 


N(s> 


ifj.ec 


I  cov( U . , U . I  ©  =  0 ) 
i  J 


N,  i*j 


since  for  some  e  >  0 ,  N(C^.)/N  >  €  for  all  N.  Thus  the  same  argument  that 


established  (3.7)  yields  (3.6);  i.e..  consistent  estimation  of  0.  Here 


%(9)  =  2ucn  Pi(ei  N(CV 

Conversely,  suppose  0  may  be  consistently  estimated.  Consider  first  the 

case  of  CN  =  {1 . N }  .  Let  { g^ ( 0  )  ,  N  >  1}  denote  the  centering  functions  for 

l'  guaranteed  to  exist  by  Definition  3.2.  Suppose  without  loss  of  generality 


that  0  <  g  (0)  <  1  for  all  N.0.  Note  that 


(3.11) 


N  _N 

0  <  Va  r  ( y  U|©  =  0)  =  y  Var  (  U  .  |  ©  =  0) 
Li=\  1  ^i=l  1 


L 

l<i< j<N 


Cov( U .  ,U  |©  =  0  )  , 


which  implies  by  (3.10)  that 


(3.12) 


Therefore 


Cov(U. ,U  |©  =  0 )  > 


l<i< j<N 


- —  y  Cov( U . ,U . I©  =  0) 

N  1  Al<i#j<N  1  J 


m 


cannot  have  any  negative  limit  points. 

For  any  bounded  random  variable  X  ,  denoting  the  bound  by  a  (i.e., 

|X|  ^  a) , 


(3.13) 


P[  I  X  |  >  e]  > 


Thus,  letting  X  =  L"  g.,(0),  for  e.  >  0 , 

N  N 

(3.14)  P[|UN  -  gN.(«)|  >  «|©  =  0]  >  E|Un  -  gN  ( ©  )  [ 2  -  e2  >  Var  ( l:N  |  ©  =  0)  -  t2  . 
By  the  consistent  estimation  of  0,  -  g^(0)  -*  0  in  probability  as  N  -»  ». 


P[|C’  -  g..(0)|  >  t  ©  =  0]  -*  0  as  N 

N  N  I 

Thus,  using  (3.11)  and  (3.14),  for  each  e  >  0. 


,  Va  r  (  U .  ( ©  =  0  ) 
^i  =  l 


.2  L 


Cov ( U .  ,U  .  |©  =  0)  t 
i  J 


l<i< j<N 


has  no  positive  limit  points.  But 


VN  2 

,  Var  ( l' .  I©  -  S  )/N 
^i-i 


for  all  e  Thus,  for  each  6 
(3  15) 


—  £  covtUj.u  |e  =  e) 

N  l<i<j<N 


has  no  positive  limit  points.  But  when  (3.12)  is  used  for  each  9  ,  the 
expression  in  (3.15)  has  no  negative  limit  points. 

Thus 
13.16) 


1 


m 


L 

l<i< j<N 


cov(U .  .  U  |Q  -  9)  -*  0 


as  N  -♦  ®  But,  this  same  argument  implies  for  fixed  t  >  0  and  any  nonsparse 


sequence  (C,.  N  >  1)  that 
N 


(317) 


f  N<v 


i.jeCN.  i#j 


cov(  U  .  .  U  .  I  ©  =  9  )  -*  0 
i  J 


as  N  -*  ® . 


Sow  suppose  that  0,3®)  -*  0  as  N  -*  ® .  Thus  it  is  easily  seen  that  for 

some  e  >  0  there  exists  a  subsequence  N'  of  the  positive  integers  and 

subsets  C  €  t,  (e)  and  t’  >  0  such  that 
N  N 


(3  18) 


Y 
L 

i  ,  jeC 


N' 


Cov( U .  , U  .  I  ©  -9) 
l  J 

i^j 


>  €  '  , 


for  all  N  and  such  that  every  summand  is  the  same  sign.  But  this  clearly 

contradicts  (3  17).  Thus  D„(fl)  -»  0  as  N  -»  ®,  establishing  El  for  {U^,,©}  and 

N  -N 

hence  the  desired  essential  unidimensionality.  □ 

Remarks  It  is  interesting  to  note  that  Theorem  3.2  allows  the  consistent 

estimation  of  ability  even  if  the  IRFs  are  unknown  to  the  practitioner.  That 

is.  use  of  U  to  estimate  A  (fl)  does  not  require  knowledge  of  the  form  of 
N  N 

A  ( 9 )  As  long  as  no  attempt  is  being  made  to  estabish  a  standardized  ability 


I'M 


scale1  across  tests  (e  g.,  as  a  precursor  to  equating  tests)  knowledge  of  t 

IRKs  is  not  required.  Moreover,  consistent  estimation  of  ability  with  unk 

!RFs  is  possible  in  several  populations  being  administered  the  same  test  - 

Section  5.  Also  note  that  the  proof  of  Theorem  3.2  makes  clear  that  when 

consistently  estimated  that  g  (0)  =  A  (9)  always  works  in  (3.6). 

N  CN 

It  is  a  foundat ional ly  relevant  fact  that  essential  unidimensionality 

implies  under  a  mild  and  natural  regularity  condition  for  {U^.}  that  the 

ability  is.  in  a  certain  sense,  unique,  as  Theorem  3.3  below  asserts. 

Definition  3.3.  Tet  a  sequence  of  tests  {L  )  be  essentially  unidimensio 

-N 

with  respect  to  ability  9.  Suppose  for  every  fixed  0^  such  that  9^  i 

the  range  R  of  9  (i.e  .  P(9  c  R]  =  1  with  R  "minimal")  that  there  ex 

e  >  0  and  an  open  neighborhood  H  of  0  such  that  for  all  0  t  H 
®1  91  1  1  ®1 


range  R  of  9  that 


(3.19) 


VV 


0, 


>  e. 


>0  for  all  N . 


Then  (L'^.,9)  is  said  to  be  locally  asymptotically  discriminating  (LAD)  wi 
respect  to  9 

VN 

Remark.  What  LAD  really  supposes  is  that  N  is  increasing  faster 

some  pos i t l ve -s lope  linear  function  in  some  neighborhood  of  9  for  every 
independent  of  N. 

Theorem  3.3.  Suppose  { U  }  is  essentially  unidimensional  with  respect  tc 
9  and  9  Let  the  corresponding  marginal  item  response  functions  be  denot 


P  .  (  0  )  =  E  (  U  i  1 9  =  9),  P  !  <  0  )  E  (  U  .  1 9  '  =  0) 
for  ali  9.  Suppose  {L\. .  9)  is  LAD  with  respect  to  9.  There  then  exists 
function  g  defined  on  the  range  R*  of  9*  such  that 


M 


and  the  range  of  g  is 


*e-'*lt-***  f,  .V»  -  •.»  -.-A  A.'>.  . 


g(0')  g  nondecreasing 

R  . 
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Remarks .  Since  a  d=l ,  M,  LI  model  is  also  an  El  model,  note  that  Theorem  3.4 
holds  for  d=l,  M.  LI  models  as  well.  Thus  Theorem  3.4  may  be  of  interest  even 
if  one  does  not  wish  to  use  El  in  IRT  modeling. 

Proof  of  Theorem  3.3.  By  Theorem  3.2,  for  each  0  and  9' 


; 3.20 ) 


uN  -  an<*)  -  o 


in  probability  given  9=9  (and  hence  on  any  subset  of  ©  =  0)  and 


(3.21) 


UN  “  AN(*‘>  -  ° 


in  probability  given  ©‘  =  9‘  (and  hence  on  any  subset  of  ©'  =  ©')  where 
An(«)  =  E[0n|©  =  0]  and  A^(0*)  =  EfUje'  =  0*]. 


Gg  g,  =  [«  =  0]  n  [©'  =  0* ] 

for  all  9,  9‘  .  Then,  for  each  9.  9‘  such  that  *  <|>,  (3.20)  and  (3.21) 


imply  on  Gg  g ,  that 
(3.22) 


An(0)  -  A’(0* )  -  0 


Fix  0  1  tR 1  and  let.  denoting  the  empty  set  by  $, 

V  *  *  +> 

Note  that  B  ,  *  $  for  all  0'tR  since  each  examinee  has  an  ability  value  for 
both  Q  and  ©'.  Suppose  0^  t  9 2  with  O^tBg,  02fcBfll  and  0 2  >  0^  without 
loss  of  generality.  Then  (3.22)  implies  that 

an(02)  -  AN(0X)  -0  as  N  -  oo 


That  is, 


^  pi<V  -  V V 


contradicting  (3.19).  Thus  B  ,  consists  of  a  unique  0  for  each  0':  i.e. 

& 

a  function  g  is  defined: 

0  =  g ( 0 ' )  for  all  0'feR. 

Choose  9'2  >  0  j  with  0|tR'.  O^tR.  Then  define 


t-f. 


because  the  essentially  unidimensional  model  (U^,  ®‘>  is  M.  By  the  definition 

of  g.  recalling  (3.22),  it  follows  that  A.,(0„)  -  A.,(0.)  has  no  negative  limit 

N  2  N  1 

points.  Thus  0 ^  >  0^  by  monotonicity  of  the  P.(0)s.  That  is,  g  is 

monotone  nondecreasing  and  well  defined  for  all  0  '  & R  1  . 

Because  [©'  =  9 ‘  ]  c  [©  =  g ( © ' ) ]  the  probability  space,  say  ft,  satisfies 

ft  =  U  (©'=«*)  c  U  (©  =  g( 0  ’  )  ) 

9  1  eR  ‘  * 

and 

it  follows  that  the  range  of  g  is  R.  □ 

Remarks .  (1)  Note  that  the  theorem  does  not  claim  that  g  is  strictly 

increasing.  That  is,  the  rescaling  given  by  g  could  assign  many  0'  to  the 

same  0.  Because  no  assumption  analogous  to  (3.19)  was  made  for  ©' ,  this  is  of 

course  expected,  for  the  ©'  scale  could  produce  a  finer  partition  than  needed 

to  achieve  essential  unidimensionality.  Thus  the  collapsing  of  distinct  o' 

into  a  single  0  cannot  be  ruled  out.  The  essential  point  is  that,  if  for  the 

9  scale  there  exists  an  interval  [a.b]  such  that, 

.  P!  (0')  =  0  for  all  i,  0'fc[a,b] 
d  0  l 

then  the  ©'  scale  should  be  rescaled  so  that  all  0't[a,b]  should  be 
collapsed  to  a  single  point,  say  0‘.  However,  assuming  (3.19)  for  0'  as  well 
does  imply  a  strictly  increasing  g. 

(2)  In  a  private  communication.  Brian  Junker  has  pointed  out  that  an 
alternate  proof  of  Theorem  3.4  can  be  given  that  produces  g  expl ic itly  It 
seems  worthwhile  to  describe  this  construction:  By  the  Helly  Selection  Theorem 


«■ 
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■\ 


V„vvv  >  , 


■V.Yu^.Y.jrW. 


for  uniformly  bounded  increasing  functions  (Billingsley,  [1968],  p.  227)  one  can 

exhibit  an  integer  subsequence  and  functions  A (9),  A' (6)  such  that  for  all  9 

A (9)  =  lim  A..  (6).  A‘(S)  =  lim  A‘(9). 
k^o  Nk  k^»  Nk 

By  (3 .19).  AO)  can  be  shown  to  be  invertible.  Junker's  proof  then  shows  that 
g  can  be  def  ined  by 


(3.23) 


g(9‘  )  =  A  ( A 1 <  0  *  ) ) 


(3)  Note  that  the  item  pool  formulation  was  essential  for  establishing  the 
uniqueness  of  ability  scale.  It  is  the  author's  position  that  an  item  pool 
formulation  with  its  implicit  requirement  of  infinitely  many  items  greatly  aids 
the  study  of  many  foundational  IRT  issues.  Indeed,  that  is  a  major  point  of 
this  research. 


It  is  an  axiom  of  psychometrics  that  a  "test"  should  be  unidimensional.  If 
not.  it  should  be  broken  up  into  a  battery  of  unidimensional  subtests,  each  to 


be  analyzed  separately.  Thus,  in  the  context  of  this  paper,  the  axiom  becomes 


that  a  test  should  be  essentially  unidimensional.  In  this  context,  the 
following  example  shows  that  it  is  possible  to  construct  a  sequence  of  tests 

<L’  ,  N  >  l)  that  is  not  essentially  unidimensional.  Thus,  the  concept  of 

-  N 


essential  unidimensionality  is  not  mathematically  vacuous. 


( r-1 )  ! 

Example  3.3.  Let  for  each  r  >  1  U.  =  IK  for  2  <  i,  j  <  2 


and  U . 


independent  of  U ^  otherwise.  Let  P[U.  =  1]  =  P[U.  =  0]  =  1/2  for  all  i 
define  the  marginal  distributions.  Then,  letting 


it  f o 1  lows  that 


,<r2)  Zi  =  2(r-1)2M  Ui 


8 


wjf’o  V*  '«*  S- 


■ATllVI 


-  •  ,*  s  ■■  -V  .*  .* 


(3.24 


that  { l' 


(3.25) 


C(r  r  >  1 


PllT  =  1J  =  P[IT  -  0]  =  1/2, 


}  are  independent,  and  that  with  probability  one, 

U  2  >  1  -  V  ,  or  0  2  <  ~—r  ■ 
r  22r_1  r  22r~1 


Suppose  essential  unidimensionality  for  {UN  ,  9).  Then  by  Theorem  3.1,  given 


9=0. 


UN  -  V*>  —  ° 

in  probability  as  N  -♦  ®. 

Thus ,  given  9=0, 

0  2  -  A  2  ( 0  )  - ►  0 

r  r 

as  r  -*  <*>.  Thus,  by  a  standard  probability  argument,  there  exists  a  subsequence 
2 

r.  such  that  given  9=0 

U  2  -  A  2 (•)  -  0 
r  i  r 

with  probability  one  as  i  -*  <*>.  Thus,  given  ©  =  0, 


(3.26) 


.  2, 

-(ri) 

U  -  A  z(9)  - ►  0 

r . 

l 


with  probability  one  as  i  -*  ®.  Because  M  is  part  of  the  assumption  of 

essential  unidimensionality,  A  ^ 0 )  is  nondecreasing  in  0.  A  contradiction 

rc 

l 


of  this  is  now  obtained.  By  (3.24),  {U 


i  >  1}  is  a  sequence  independent 


identically  distubuted  random  variables  with  marginal  distribution  p ( 0 )  = 

p ( 1 )  12.  Let,  for  i  >  1  and  f ixed  1/16  >  e  >  0 

A.  Q  =  [A  (©)  <  t],  A.  1  =  [A  2(»)  ^  1  -  t] 

ri  r  i 


1 1  |  l|  >J  II J  II .■  11,1  I  pj  pj  (pj  I 


no.- 
> 

K-:*- 
■V.  v 


1 


IE 

is 


k-'. 

:::> 

.  -  . 
La  L*. 


of  (U  ,  i  >  1 ) .  we  choose  i' 

By  (3. 

25!  , 

(3.26) .  and  the 

distribution 

SO 

large 

that 

(3.27) 

>  1/2  -  £, 

P[A.(  ^  >  1/2  -  €, 

>  1/2  -  e. 

Pt\'+l.l]  21  1/2  -  * 

and  further  that 
(3.28)  P[A .  _  D  A 


P[A .  „  . 

l '  , 0  l ' .  1 


P[A .  DA.  1  >  1/4 

1  l  '  ,  1  l ' +1 , 0 J 


]  >  1/4  -  t. 

Because  the  events  in  (3.28)  each  have  positive  probability,  they  are  each 
nonempty  events  of  the  probability  space,  £1  say.  This  will  produce  a  contradic 
tion  as  follows:  Let  and  be  outcomes  of  £1  such  that 


w _  €  A  ,  DA.,  ,  , 

0  1,0  1+1,1. 


M,  €  A.,  .  DA.,  .  _ 

1  1,1  1+1,0 


Then  obviously  w0  fe  Aj .  0  '  wi  e  Aj’  i  •  so  that  ®(<*>q)  <  ®*wi^  by  the 

monotonicity  of  A  2 ( « )  .  But,  similarly,  Wj  t  A.,+1  Q.  wQ  e  A.,+1  so  that 

r .  ’ 

l 

SlUj)  <  QIWq),  a  contradiction. 

Theorems  3.2  has  an  interesting  multidimensional  analogue.  Let  for  a 
latent  model  ( L^ ,  9)  with  item  response  functions  (P.(fl),  1  <  i  <  n), 


(3.29) 


A  J9)  =  )  P.(9)/N. 

N  ‘  Zi=l  1  ’ 


(the  distinction  between  A.,(0)  and  A..(9)  henceforth  assumed  clear  from  context) 

N  -  N 

Theorem  3.4.  Suppose  essential  d  dimensionality  with  respect  to  ability  9. 

E  - 

Then.  6  is  able  to  be  consistently  estimated  in  probability  in  the  sense  of 

(3.6)  with  g  (9)  replaced  by  g  (9)  in  (3.6). 

N  CN 

Suppose  that  essential  d^.  dimensionality  fails  for  (U^).  Then  there  do 

not  exist  a  d  dimensional  9  and  accompanying  functions  g  (9)  such  that 
E  CN  " 

<  L!n  .  9)  is  monotone  and  for  each  given  9  =  9 
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given.  into  an  operational  definition  with  only  the  "observed"  data  given,  that 


is  with  the  distribution  of  the  test  given.  Of  course,  in  any  statistical 

application  the  given"  distribution  of  U„  is  observed  only  with  error  because 

only  a  finite  amount  of  data  is  ever  available.  With  only  the  distribution  of 

U  given,  then  the  definition  must  contend  with  the  essential  nonuniqueness  of 

the  IRFs  and  accompanying  scale  for  9. 

(  B  )  (  C  ) 

Definition  4.1.  Let  U.\  and  U.\  represent  a  test  administered  to  two 

-  pg  “  N 

populations,  B  and  C.  Then  d-dimensional  invariance  holds  provided  each  test 
administration  has  a  d-dimensional  M,  LI  representation  using  the  same  item 


response  functions, 
dimensional , 

(4.i)  p[yiB)  = 

and 


That  is,  for  each  u  =  (Uj . uN)  with  9  and  9'  d- 

,  N  ,  r  .  u  .  r  ,1-u.,  ,  _ . 

u]  =  [IT  | [Pi<® >]  1  f1  *  *}  dP  (») 


(4.2)  PfU<C)  =  u]  = 

where  and  P*C* 


r  N  .  .  ,  u  .  r  ",  1  -  u  . . 

j  F  ![Pi<-',J  1  L1  '  Pi ^ ®  *  ']  'J  dP  C  ( 9 ' ) 

are  arbitrary  distributions  on  Rd ,  d  dimensional 


Euclidean  space. 

Remarks  on  Definition  4.1.  (1)  A  key  point  to  note  is  that  the  ability 

distributions  Pv  and  P'  are  arbitrary  and  in  no  way  required  to  be 
related  to  one  another.  This  amounts  to  allowing  an  arbitrary  choice  of  ability 
metric  for  each  population  in  an  effort  to  obtain  the  same  item  response 
functions  {P  (0)}  in  (4.1)  and  (4.2).  The  two  metrics  need  not  be  the  same  in 
any  mathematical  or  psychological  sense.  Nevertheless,  once  statistical 
evidence  is  given  that  (4.1)  and  (42)  hold,  it  is  standard  1RT  practice  to 
declare  that  a  common  ability  metric  has  been  found. 


(2)  In  applications,  because  the  latent  ability  is  usually  assumed  to  be 
unidimensional,  "invariance"  usually  means  unidimensional  invariance  For 
example,  when  invariance  is  used  to  justify  a  technique  for  identifying  biased 
items,  then  the  practitioner  surely  has  un id imens iona 1  invariance  in  mind  (see 
Lord  (1980),  Chapter  14.  for  example). 

(3)  Of  course,  once  the  IRFs  for  a  model  are  specified,  then  invariance 
holds  for  all  subpopulations  of  ©.  For.  an  IRT  model,  once  specified,  by  its 
very  structure  assigns  to  each  examinee  a  fixed  0  Thus  altering  the 
distribution  of  ©  by  choosing  a  subpopulation  of  examinees  cannot  change  the 

IRFs.  The  distribution  of  for  the  subpopulation  is  then  derived  from  (4.2) 

( 0  ) 

with  P  (e)  the  subpopulation  distribution  and  the  IRFs  identical  to  those  in 
(41).  the  expression  for  the  entire  population.  Thus,  the  Lord  viewpoint  of 
fixing  the  latent  variable  9  is  appropriate  when  focusing  on  a  subpopulation 
aft e r  the  IRT  model  has  been  specified. 


(4)  Note  that  (4.1)  and  (4.2)  really  state  that  populations  B  and  C  being 
administered  the  test  each  can  be  modeled  by  a  M  d=l  LI  model. 

The  following  idealized  example,  in  the  author's  opinion,  illustrates  a 


fundamental  flaw  in  the  uncritical  application  of  invariance. 

Fxample  41  Consider  two  populations  of  examinees,  males  and  females  say.  Let 
9  denote  the  un i d i mens  iona  1  ability  intended  to  be  measured.  Let  P.(0), 


1  <  i  <  N  denote  a  family  of  item  response  functions  that  satisfies  (4.1)  for 
males  Suppose  that  the  items  are  uniformly  biased  against  females  in  the  sense 


PfL"  ^  1  |  female  of  ability  9)  -  P.(0  1)  for  all  i.  9  and 

P[L  l|male  of  ability  0]  -  P.(0)  for  all  i.0. 


•w\ 

v. 


■  f  »  *  v  V  *  d  A  -  » 


Thus,  for  females,  for  all  u.  with  P  denoting  the  distribution  of  ability  for 
fema les , 

,  n  ,  ,  ,u.  r  , 1 -u  . . 

P[UN  =  u]  =  jTT  [[P^e  -  1)J  1  [1  -  -  nj  dP^e). 

But.  a  simple  change  of  variable  9'  =  8-1  yields 

f  N  ,  u.  .  , 1 _u . 

P[UN  =  u]  =  JTT  [Pf  ( «  '  ) J  1  [l  -  P1(«*)J  dP* (9 '  ♦  1) 

F 

Thus  (4.2)  holds  with  P  (9‘  +  1)  the  new  ability  distribution.  Unidimen¬ 
sional  invariance  therefore  holds,  in  spite  of  the  pervasive  (and  uniform)  sex 
bias  in  the  test.  □ 

The  example  is  certainly  idealized.  For  example,  some  items  would  surely 
be  more  biased  than  others  in  any  actual  application.  But  it  represents  a 
serious  practical  problem,  in  the  author's  view.  Unidimensional  invariance  is 
no  guarantee  against  failure  to  identify  pervasive  bias.  What  is  really  going 
on  is  that  if  administered  simultaneously  to  males  and  females,  the  test  is 
driven  by  a  two-dimensional  latent  variable  ( ® ^ ,  9 ^) ,  where  8^  is  the  ability 
to  be  measured  and  9 2  (=  -1  for  females.  =  0  for  males  say)  measures  the 

degree  of  bias.  For  example,  flj  could  be  mathematical  ability  and  9^  could 
be  familiarity  with  computers.  However,  the  above  example  is  easily  seen  to  be 

unidimensional  in  the  traditional  sense.  For,  let  9  =  8^  +  8 .  Then 

rnr  ,u.  r  ,  i-u 

(4.3)  P[UN  =  u]  =  JTT  [Pj*®)  1  [!  -  1  dP<»> 

F 

where  P(8)  =  P  (9+1).  Thus,  pervasive  bias  is  possible  even  when  traditional 
unidimensionality  holds.  In  this  regard  the  following  easily  proved  theorem  is 
relevant . 

Theorem  4.1.  Let  U„  and  U„  ’  represent  a  test  administered  to  two 

— - -  -N  -N 

populations,  B  and  C.  Then  unidimensional  invariance  holds  if  and  only  if 
traditional  unidimensionality  holds. 


Proof .  Assume  traditional  un i d i mens i ona 1  i  ty .  Then  (4.3)  holds  for  monotone 
IRFs  P.(fl).  But  then  (really  the  context  of  Remark  (3)  above) 

P[Ln  =  u|B]  =  JTT  [Pi(«)  1  [i  Pi<e)  1  dPg ( 0 ) 

follows  trivially  where  P  ( S )  -  P(0|B).  The  same  holds  for  C;  thus,  unidimen- 

D 

sional  invariance  holds. 

Assume  unidimensional  invariance;  i.e.,  (4.1)  and  (4.2)  with  fl.  0'  real 
valued.  Let  9’’  be  the  9  of  (4.1)  for  all  Population  B  examinees  and  9''  b 
the  9'  of  (4.2)  for  all  Population  C  examinees.  Thus,  each  examinee  of  B  U  C 
is  assigned  a  unidimensional  0’ ' .  Then,  for  some  0  <  p  <  1 

P[UN  =  u]  =  P[UN  =  U |  B]  p  «■  P[UN  =  u |  C]  ( 1  -p) 

f  ^  [pi(fl  ")]  1  1  -  Pi<«")]  1  djpP(B)(0")  -  (1-p)  P(C)(0,,)|. 

Clearly,  letting  P''(0’’)  =  pP^B*(0'')  +  ( 1-p ) P* C * ( 0 ' ' )  completes  the  proof. 

□ 

Remark .  The  theorem  shows  us  that  (unidimensional)  invariance  is  simply 
traditional  unidimensionality. 

Theorem  4.1  and  the  results  concerning  essential  unidimensionality  suggest 
that  unidimensional  invariance  be  redefined  so  that  it  dovetails  with  essential 
un id i mens i ona 1 i ty . 

Def in  i  t  ion  4.2.  Let  {L'^B'.  N>1)  and  {U^C*,  Nil)  represent  a  sequence  of 
tests  administered  to  two  populations,  B  and  C.  Then  essential  d  dimensional 
invariance  holds  provided  each  test  administered  has  an  essential  d-dimensional 
representation  using  the  same  latent  model  representation  Pfu^|©  =  9],  That 
is.  for  each  u  =  (u  .  •••  .u^)  with  0  and  0'  d-dimensional, 

p[l,\<.B)  ^  til  =  fp[ule  =  0I  dP (  B 1  ( 0  ) 


(4  4  ) 


where  and  p'C^  are  arbitary  distributions  on  R0*  ,  d  dimensional 

Euclidean  space  and  (4.4)  and  (4.5)  each  define  essential  d-dimensional  models. 
The  analogue  of  Theorem  4.1  is  trival  to  state  and  prove. 

Theorem  4.2.  Let  u'  and  u'  represent  a  test  administered  to  two 

- -  -N  -N 

populations.  B  and  C.  Then  essential  unidimensional  invariance  holds  if  and 
only  if  essential  unidimensionality  holds. 

Proof .  Same  as  that  of  Theorem  4.1.  except  for  minor  details.  □ 

Example  4.1  compels  us  to  be  cautious  concerning  the  centrality  of  the 
concept  of  invariance  in  IRT  modeling.  For,  unidimensional  invariance  (whether 
essential  or  traditional)  clearly  does  not  preclude  the  inappropriate  assignment 
of  a  common  metric  to  the  underlying  ability  of  interest  in  a  single  test,  two 
population  problem.  Can  something  be  substituted  for  unidimensional  invariance 
that  will  rule  out  such  faulty  applications?  We  suggest  that  the  central 
property  that  must  hold  in  such  single-test,  multiple-population  applications  is 
essential  unidimensionality  together  with  the  conclusion  that  the  underlying 
essentially  unidimensional  ability  8  is  the  ability  intended  to  be  measured. 

In  the  above  example.  8^  was  the  ability  intended  to  be  measured  rather  than 
9-9 ^  This  suggests  the  following  definition. 

Def in i t  ion  4.3.  A  test  sequence  {U^,  N  >  1}  is  said  to  be  valid  provided  (i) 
it  is  essentially  unidimensional  with  respect  to  9  and  (ii)  9  is  the  ability 
desired  to  be  measured. 

Certain  results  in  Section  3  support  the  appropriateness  of  this 
definition.  First.  Theorem  3.3  states  that,  under  mild  regularity  conditions, 
essential  unidimensionality  with  respect  to  8  guarantees  that,  up  to  monotone 


transformations,  the  8  of  the  model  is  unique.  That  is,  essential 

unidimensionality  makes  the  measurement  of  9  well  defined  in  the  sense  that  9 

itself  is  unique  and  hence  well  defined.  Second,  Theorem  3.2  shows  that  U  , 

-N 

through  computation  of  the  statistic  U^,  can  be  used  to  consistently  estimate 

9  by  use  of  the  rescaling  A^fS).  That  is,  the  data  can  be  used  operationally 

to  obtain  9.  as  one  would  expect  a  "valid”  test  to  be  able  to  do. 

Of  course,  IRT  validity  as  defined  above  requires  both  essential 

unidimensionality  and  that  the  underlying  latent  ability  9  is  the  ability 

intended  to  be  measured.  Statistical  analysis  of  data  from  the  administration 

of  a  test  cannot  in  the  absence  of  additional  data  concerning  other  valid  tests, 

external  criteria,  etc.  be  used  to  ascertain  whether  the  latent  ability  being 

measured  is  the  one  intended.  However,  statistical  analysis  of  data  from  the 

administration  of  a  test  can  be  used  to  assess  whether  the  prerequisite 

essential  unidimensionality  holds.  Moreover,  as  remarked  above,  the  author's 

(Stout,  1987)  statistical  test  of  unidimensionality  is  designed  to  address 

precisely  this  question  of  whether  essential  unidimensionality  holds. 

One  final  point  must  be  emphasized.  If  essential  unidimensionality  holds 

for  a  combined  multiple-population  test,  then  it  is  purely  a  matter  of  taste  and 

convenience  which  transformation  of  the  the  underlying  ability  9  is  used  for 

the  ability  scale.  In  Section  3,  A„(0)  is  used  for  the  one  population  case 

N 

because  it  makes  the  basic  estimation  results  especially  easy  to  formulate. 

Clearly,  if  one  wishes  to  use  a  common  metric  for  two  or  more  populations  being 

administered  the  same  test,  then  the  A  (0)  of  the  combined  superpopulation  is 

N 

totally  appropriate.  That  is.  the  theory  of  Section  3  easily  extends  to  the 
fixed  test,  multiple  population  setting.  This  is  developed  in  Section  5. 


o.  Two  Group  Test  Bias.  In  this  section,  we  will  apply  the  theory  of  Section  3 

to  the  situation  in  which  the  test  population  is  assumed  to  consist  of  two 

groups  of  examinees,  B  and  C.  The  main  objective  is  to  assess  whether  the 

estimation  of  ability  is  somehow  "unfair"  to  Group  B  as  compared  with  group  C, 

or  vice  versa.  Let  9  be  the  unidimensional  ability  intended  to  be  measured  in 

B  _  b 

the  combined  population.  Let  9  and  denote  respectively  the  ability 

intended  to  measured  and  the  test  scores  of  a  randomly  chosen  examinee  from 

c  _  c  0 

Group  B.  Define  ©  and  U  similarly.  Note  that  by  definition  9  =  S  for 

C 

each  Group  B  examinee  and  ©  =  9  for  each  Group  C  examinee. 

The  results  of  Section  3  suggest  that  essential  unidimensionality  implies 
consistent  estimation  of  9  in  each  group. 

Theorem  5.1.  If  essential  unidimensionality  holds  for  9  in  the  combined 
population  consisting  of  Group  B  and  Group  C  examinees,  then  9  is  able  to  be 
consistently  estimated  in  each  population  using  the  A^(9)  =  E [ 0^ | © ]  scale 
computed  from  the  combined  population. 

ProoL’  Fix  0  By  Theorem  3.2,  given  9=9, 


(5.1) 


UN  -  AN(0)  -  ° 


in  probability  as  N  -♦  °°.  Let  2)  denote  the  event  that  a  randomly  sampled 


examinee  (according  to  the  distribution  of  ©)  is  a  Group  B  examinee.  Fix 


e  >  0.  Let  G. 


=  |L’  -  A  (  9 )  I  >  t  .  It  is  an  elementary  fact  of  probability 

N  N  j 


that  P[G^]  -*  *  •  and  P(j8]  >  0  implies  that  P[G^|tB]  -»  1.  Thus  given  9  =9, 

it  follows  that 

(5.2)  0®  -  A  (9  )  0 

N  N 

in  probability  as  N  -»  The  argument  is  the  same  for  any  nonsparse  sequence 
( C  ,  N  >  1).  Thus,  the  result  is  proved  for  Group  B.  The  argument  is  the  same 
for  Group  C.  0 


Since  essential  unidimensionality  guarantees  consistent  estimation  of  9 
in  each  group,  this  suggests  that  a  natural  setting  for  the  study  of  test  bias 


is  under  the  assumption  that  ©}  is  essentially  d  dimensional  for  some 

d  >  1.  Essential  d  dimensionality  for  some  d_  >  1  is  assumed  throughout  the 
fc.  t-  L 

remainder  of  Section  5.  We  suppose  throughout  Section  5  that  9  determines  9; 


i.e.  9  is  a  func*ion  of  9.  Then,  without  loss  of  generality,  we  assume  that 
the  first  component  of  9  is  the  ability  9  intended  to  be  measured.  Thus 
9  =  (9,  9 g )  where  9^  consists  of  "nuisance"  abilities.  We  first  note  that 
essential  d£  dimensionality  guarantees  that  the  proportion  correct  consistently 
estimates  A^,(9)  among  all  examinees  of  ability  9  regardless  of  their  group 
membership . 

B  -  B 

Theorem  52.  Let  ©  and  U°  denote  respectively  the  ability  and  the  test 

N  ' 

C  -c 

score  of  a  randomly  chosen  examinee  from  Group  B.  Define  ©  and  (1, 

N 

analogously.  Then,  for  each  nonsparse  sequence  {CN>  N  >  1}  and  for  each  9, 
given  ©B  =  9 , 


(5.3)  U  -  A  (9)  0 

N  S 

Q 

in  probability  as  N  -«  and  for  each  9,  given  ©  =  9  , 

(5.4)  Up  -  A  (9 )  -  0 

N  N 

in  probability  as  N  -*  ® 

Proof .  Essentially  the  same  as  that  of  Theorem  5.1.  c 

We  propose  the  following  definition  of  test  bias. 

Definition  5.1.  We  say  that  there  is  no  test  bias  in  the  estimation  of  ©  if, 

B  B  C 

for  each  9,  the  distribution  of  ©  given  ©  --  9  is  equal  to  that  of  © 


g  i  ven  ©  9 


-38- 


Remark.  It  is  essential  to  note  that  this  definition  of  test  bias  places  r 

restrictions  on  the  distributions  of  the  ability  to  be  measured  for  Group  f 

Group  c.  The  existence  of  test  bias  rests  in  the  conditional  distributions 
B  B  C  C 

®  given  9  and  9  given  9  and  not  in  the  marginal  distributions  of 

sL  -  d 

and  The  point  is  that  bias  is  not  to  be  mistaken  for  a  genuine  differ 

between  the  two  groups  in  the  ability  e  to  be  measured.  Rather,  it  rests 
group  ability  differences  in  other  attributes  also  influencing  the  test  in 
essential  way  among  examinees  with  the  same  9  ability. 

It  is  useful  to  illustrate  the  role  of  Definition  5.1  with  a  simple 
examp  1 e . 


Example  5.1.  Suppose  essential  2 -d i mens i ona 1 i ty  for  9.  Let  8  =  (e,  8 ) 


fix  8=8  Let  9  be  a  discrete  random  variable  with  range  {0,11.  Supp 


for  Group  B  that,  given  ©  =  0,  then  ©2  =  0  with  probability  3/4  and  th 


for  Group  C,  given  ©  =  0.  then  ©2  =  1  with  probability  3-4.  Recall  that 


YN 


A^(®|  =  E  [  0'N  1 8  J  =  2‘i  =  1  P.(©)'"N.  Suppose  the  item  response  functions  are  su 


that  A^.tO.  01  1  8-  A  ^  ( 0 .  1)  =  7-8  for  all  N.  Then,  according  to  Theorem 


given  9  (0 .01 


uv  -  1  -*  0  and  uj  -  4  -*  0  ■ 
N  8  h  8 


each  in  probability  as  N  -»  <*>.  Also,  given  ©  =  (0,1), 


L’[!  -  5  -»  0  and  uj“.  -  1  -*  0 . 
h  8  N  8 


each  in  probability  as  N  -»  ,  Note  that 


P[©2  =  i  I  ©B  =  0]  ^  i.  p[©2  =  i  |  ec  -  o]  =  |. 


Thus  test  bias  m  the  sense  of  Definition  5.2  exists.  How  does  this  test  t 


__  b  _  q 

•if fort  thr  asymptot  ic  behavior  of  Uv.  and  U  ?  For  Group  B. 

N  N 


P(,/R  ‘  -  0  I  ©B  =  0]  =  §  ,  P[Cj  -  I  -  0  I  ©B  =  0]  =  i: 


wh lie  for  Group  c 


P[UN  -  8  -  °  1  ®  =  01  =  4  *  P[UN  -  8  -  °  I  ®  =  °1  "  4’ 

_B  _  Q 

Thus,  even  though  (5.3)  and  (5.4)  hold,  the  behaviors  of  U„  and  U.,  in  their 

N  N 

attempts  to  estimate  9  (on  some  scale)  clearly  favor  Group  C  over  Group  B  for 

-C 

examinees  of  ability  0=0.  Here  the  asymptotic  distribution  of  U  M  given 

N 

C  ”B  B 

©  =  0  is  stochastically  larger  than  the  distribution  of  U.,  given  ©  =  0.  □ 

N 

Note  that  the  marginal  distributions  of  the  ability  to  be  measured  in  Group  B 

and  in  Group  C  played  no  role  in  the  example. 

Recall  that  essential  d£  dimensionality  for  d^  >  1  implies  by  Theorem  3.7 

that  there  do  not  exist  functions  g.,(8)  such  that 

N 

°N  -  gN(9)  -  ° 

in  probability  as  N  -*  ®.  This  precludes  consistent  estimation  of  8.  However, 

if  there  is  no  test  bias  (in  the  sense  of  Definition  5.1),  then  using  proportion 

correct  to  score  the  test  is  guaranteed  not  to  favor  either  group  over  the  other 

asymptotically  in  any  way  whatsoever,  as  the  following  theorem  makes  precise. 

Theorem  5.3.  Suppose  there  is  no  test  bias.  Let  {C„,  N  >  1}  be  any  nonsparse 

N 

__  b  g 

sequence.  Then  for  each  9,  the  asymptotic  distribution  of  U  given  9=9 

N 

-C  c 

is  the  same  as  the  asymptotic  distribution  of  G  given  ©  =8. 

s 

Proof .  Fix  8.  The  argument  used  to  prove  Theorem  5.1  is  easily  modified  to 

g 

establish  that,  given  ©  =8, 


(5.5) 


U®  -  A  («.©?)-  0 
N  N 


in  probability  as  N  -*  «,  and  that,  given  9=9, 


(5.6) 


-  Ar  (8.  ©S  -  0 
N  Si  1 


in  probability  as  N  -♦  ®.  Here  A^(8)  =  A^(8,  a^sence  °f  test  bias 

B  B 

merely  means  that  the  distribution  of  ©“  given  9=9  is  the  same  as  the 

c  c 

distribution  of  ©2  given  ©  =8.  The  desired  result  then  follows  from  (5.5) 


and  (5.6) . 


Remarks .  (1)  The  point  of  the  theorem  is  clear:  If  test  bias  does  not  exist 

_  b  ~Q 

when  d  >  1,  then  U  and  U  are  equally  inconsistent  in  their  respective 
t  <-N  ln 

attempts  to  estimate  ©  That  is,  each  group  is  equally  mistreated. 

(2)  At  the  end  of  Section  3,  we  pointed  out  that  essential  unidimension¬ 
ality  for  a  latent  trait  family  of  models  {U„,  ©}  still  allows  for  fixed 

-N 

measurement  error  for  a  finite  length  test  administration,  this  caused  by  the 

presence  of  abilities  other  than  e.  these  other  abilities  being  inessential. 

Clearly,  in  the  two  population  setting  of  Section  5,  it  similarly  holds  that 

this  source  of  measurement  error  can  favor  one  population  over  another  in  a 

finite  length  test  administration,  even  when  essential  unidimensionality  holds. 

From  a  lack  of  model  fit  perspective,  the  issue  becomes  one  of  whether  the 

B  C 

magnitude  of  the  differences  A„(©)  -  A/!!©)  as  ©  varies  are  too  large  to  be 

N  N 

ignored  Here  A®  ( e  )  =  — jjp  P.(©B)|©B  =  ©j  and  A^  ( © )  is  defined 

similarly,  where  (U  ,©,N>1>  is  assumed  to  have  d  =  1  and  traditional 
“iV  fc 

B  C 

dimensionality  dim(©)  >  1.  Because  dim(©)  >  1,  AN(8)  and  A^( © ) ,  the  Group  B  and 
Group  C  intrinsic  ability  scales  defined  by  (3.1)  and  (3.2)  will  in  general  be 
different . 

(3)  The  theoretical  results  of  this  paper  caution  against  the  casual  use 
of  short  tests  with  confidence  that  test  bias  will  not  occur.  For,  the  shorter 
the  test,  the  harder  it  is  to  assess  essential  unidimensionality.  Furthermore, 
even  if  essential  unidimensionality  holds,  the  shorter  the  test,  the  more  likely 
(3.30)  (or  the  opposite  inequality)  is  to  hold  to  a  damaging  degree.  By 
contrast  Theorem  5.1  guarantees  for  a  long  essentially  unidimensional  test  that 
(3.30)  will  have  little  to  no  ill  effect  in  the  consistent  estimation  of  the 
essential  trait  in  each  population, 


6.  Essential  Unidimensionality  and  Linear  Formula  Scoring.  Thus  far  we  have 

presented  our  thesis  in  the  context  of  ability  estimation  with  proportion  right 

used  as  the  estimator  of  ability.  Of  course,  one  of  the  major  contributions  of 

tRT  has  been  the  establishment  that  the  use  of  a  "linear  formula  score” 

?  a.  „U.  .  all  a . . ,  >  0 
l.N  l  lN 

can  be  more  appropriate  than  the  use  of  0^ .  For  example,  with  N  fixed  in  the 

YN 

M,  LI  case  of  two  parameter  logistic  modeling.  .  a.  U.  is  a  sufficient 

/.i  =  l  i,N  l 

statistic  for  6  provided  a.  N  is  proportional  to  the  discrimination 
parameter  of  the  ith  item. 

Definition  6.1.  A  sequence  of  linear  formula  scores  with  coefficients  {a. 

l ,  N 

1  <  i  <  N.  N  >  1}  is  called  admissible  if 


(6.1) 


for  some  constant  K. 


0  <  a.  <  — for  all  i,  N 
x ,  N  N 


Remarks .  Several  special  cases  of  linear  formula  scores  are  admissible.  First 


a .  .  ,=  1/N  for  i  <  N,  yielding  {U.,,  Nil)  is  clearly  admissible.  Second 
l.N  N 

a.  N  =  1  /  N  ( C  N )  for  all  ieCN  and  equal  to  zero  otherwise  with  {C^.  Nil)  a 
nonsparse  sequence  of  integer  subsets  clearly  yields  an  admissible  sequence  of 


linear  formula  scores  since  by  definition  N(C„)  i  e.N  for  all  N  for  some  e  > 

N 

0.  Third,  suppose  a  two  parameter  logistic  model  for  {U^ ,  i  i  1)  with 

discrimination  parameters  a.  satisfying 

(6.2)  0<e<a.  <K<®  for  all  i. 


Then,  the  normalized  sufficient  statistic 


(6.3) 


)•  i  aU- 

Z,i  =  1  li 


Zi=l  i 
.  /  VN 


is  clearly  admissible  with  a.  ..  =  a./  '  ,  a.. 

i  ,  N  i  / i  =  l  i 


It  can  be  argued  that  most  useful  linear  scoring  methods  surely  satisfy 

(6.1)  since  (6.1)  merely  requires  a  scoring  method  where  no  single  item  or  small 

subset  of  items  is  allowed  undue  influence  on  the  overall  formula  score. 

VN 

Note  that  if  '  .  a.  ,,  -»  0  as  N  -*  ®,  then  the  corresponding  formula 

(_\  =  1  l  .  N 

score  is  of  no  practical  interest  since 


U  5  Y  a  U.  -*  0 
N  ll=1  l.N  l 


with  probability  one  as  N  -*  ®  and  hence  cannot  estimate  anything  consistently 
(other  than  0)  Thus  each  admissible  linear  formula  score  of  interest  should 
a  1  so  satisfy  for  some  t  >  0 


16  4) 


L 


i  =  l 


i  .  N 


>  € 


for  all  Nil. 


Definition  6  2  An  admissible  sequence  of  linear  formula  scores  with 
coefficients  satisfying  (6.4)  is  called  an  admissible  nonsparse  sequence. 

In  this  regard,  note  that  if  a.  ^  >  0  only  on  a  sparse  sequence  of  integer 

subsets  (  i  e.  .  c  (1 . N} ,  where  N(CN)/N  -♦  0  as  N  -»  <*>) ,  then  either 

(61)  or  (6.4)  is  violated.  That  is,  an  attempt  to  use  a  "sparse"  sequence  of 
items  to  estimate  9  will  either  result  in  an  inadmissible  sequence  of  linear 
formula  scores  or  one  which  is  sparse  (in  the  sense  that  (6.4)  fails)  and  is 
hence  useless. 

It  is  now  possible  to  advance  a  theory  very  similar  to  that  of  Sections  2  - 
5  that  includes  admissible  nonsparse  linear  formula  scoring.  For  example,  the 
analogue  of  the  sufficiency  part  of  Theorem  3.2  is  as  follows. 

Theorem  6.1.  Let  (U^,  9)  be  essentially  unidimensional  with  respect  to  9. 
Then,  for  each  given  6=9  and  each  set  of  nonsparse  admissible  linear  formula 
scores 


U 


N 


YN 

L 


i  =  1 


i ,  N 


l? . 
l 


it  follows  that 


(6.1) 


UN  *  V9>  -  ° 


in  probability  as  N  -»  ®  where 


V9'  5  Ii-1  “i.N  Pi191 


Proof.  Fix  9.  (t  suffices  to  show  that 


i  r  ,  a .  . ,  U .  9  =  9  -*0. 


9  =  9 


lLai,N  Var‘Uil®  =  •> 


Y 

L 

l<i<  j<N' 


a  .  a  .  Cov  (U.  ,U  .  I©  =  9)  -*  0  as  N  -*  “> 
l .  N  j  ,  N  l  j  ' 


by  admissibility  and  essential  unidimensionality;  that  is.  (6.1)  holds. 

The  use  of  U„  as  estimator  in  the  development  of  the  theory  in  Sections 
N 

2-5  was  done  for  simplicity  and  clarity  of  exposition  and  not  because  of 
necessity.  Further  generalization  along  the  lines  of  Sections  2-5  using 
nonsparse  admissible  linear  formula  scoring  is  routine  and  is  left  to  the 
reader . 


7.  Essential  Unidimensionality  and  Consistent  Estimation  of  9  on  the  9  scale. 

The  use  of  {U„}  or  more  generally  of  a  nonsparse  admissible  linear  formula 
N 

N 

score  (I.  .  a .  U . )  as  a  sequence  of  estimators  of  9  on  the  intrinsic 
1=1  i ,N  l 

ability  scale  when  essential  unidimensionality  holds  supposes  a  single  fixed 
test  administered  to  one  or  more  populations.  Applications  in  this  setting  were 
developed  in  Sections  3,  5,  and  6.  Such  single-test  applications  of  IRT  occur 
less  frequently  in  practice  than  multiple-population  multiple-test  applications. 

In  multiple-test  applications  a  standardized  ability  scale  is  usually 
desired,  perhaps  as  a  prerequisite  to  a  horizontal  equating  of  the  various 


*  / 


tests.  In  many  such  applications,  the  items  (or  at  least  a  common  core  of  them) 
have  been  calibrated  relative  to  the  constructed  standardized  ability  scale  9. 
Then  a  major  application  is  the  estimation  of  individual  abilities  on  the  9 
scale.  Estimation  of  9  with  known  IRFs  has  been  widely  treated  in  the 
literature  (see  for  example  Hambleton  and  Swaminathan,  1985,  Section  5.3). 
Maximum  likelihood  estimation  (MLE)  is  one  method  of  choice  in  this  setting. 

The  MLE  9  "consistently"  estimates  9  under  suitable  regularity  conditions  in 
the  sense  that,  given  6  =  9  .  9  •*  9  in  probability  as  the  number  of  items 
N  <».  (Here  and  throughout  Section  7,  "consistency"  is  used  in  its  usual 
mathematical  statistical  sense  and  is  not  to  be  confused  with  its  special  usage 
as  given  by  Definition  3.2.)  Only  rarely  however,  is  it  possible  to  provide  an 
explicit  formula  for  the  MLE  as  a  function  of  U  .  Moreover,  the  MLE  is  usually 
a  highly  non-linear  function  of  UN .  Thus  in  the  known  IRFs  case  it  seems 
desirable  to  seek  alternatives  to  MLE  that  are  based  on  linear  formula  scoring 
and  for  which  explicit  formulae  are  available.  We  now  propose  a  family  of  such 
estimators,  using  the  results  of  Sections  3  and  6. 

Recall  from  Theorem  3.2  that  when  <11. }  is  essentially  unidimensional  with 

-N 

respect  to  ©  then  for  each  given  9=9, 

°N  -  V9> 

in  probability  as  N  -*  ®.  This  suggests  estimating  9  by  <A^  (U^)}  and  also 
suggests  for  each  given  9=9  that 

an'(0n)  -  • 

in  probability  as  N  -*  ®  should  hold.  Moreover,  recalling  Theorem  6.1  and  its 
notation,  this  result  should  generalize  to  nonsparse  admissible  linear  formula 


scoring  with,  for  each  9=9, 


in  probability  as  N  -*  ®  Theorem  7.1  below  states  that  this  is  true  provided 
local  asymptotic  discrimination  holds.  Definition  7.1  is  the  appropriate 
analogue  of  Definition  3.3. 

Definition  7.1.  Let  a  sequence  of  tests  be  essentially  unidimensional  with 

'  N 

respect  to  ability  ©  Let  A  (0)  =  Z  .  .  a  P.(0)  be  formed  from  a  nonsparse 

N  1=1  ni  l 

admissible  sequence  of  linear  formula  scores.  Suppose  for  every  fixed  0^  such 

that  0  is  in  the  range  R  of  ©  that  there  exists  e  >  0  and  an  open 
1  ®1 

neighborhood  Hg  of  0^  such  that  for  all  0^  €  H0  and  in  the  range  R  of 
9  with  0£  >  0j  that 

~  p  /  rt  I  -  p  /  Q  \ 

,,  Zi = 1  iN  i  2  Zi=l  iN  i'  V  >  (0,  -  0,)  for  all  N. 

1,1  - 0 — ~0  ®i  i  1 

2  1 

Then  {U^,  9.  ( 0 ) >  is  said  to  be  locally  asymptotically  discriminating  (LAD) 

with  respect  to  9. 


Usually  A . , ( 0 )  is  continuous  in  applications,  thus  making  its  inverse  well 
N 

defined  over  its  range.  However,  in  order  to  have  a  theory  that  allows  for 
discontinuities,  the  following  definition  of  A^J(u)  will  be  used 

A^lu)  =  inf  {0.  AN(  ©  >  >  u)  . 

0tR 

Here  R  denotes  the  range  of  ©.  Note  that  A^lu)  =  -®  or  ®  is  possible;  e.g.. 
if  u  =  15  and  A,, (0)  >  1  4  for  all  0. 

Theorem  7.1.  Let  {U^,  ©}  be  essentially  unidimensional  with  respect  to  0. 

N 

Suppose  A^l©)  -  anj  Pj(®)  is  formed  from  a  nonsparse  admissible  sequence 

*  W  % 

of  linear  formula  scores  l\,  =  .a  .  U..  Suppose  {U„,  ©,  A.,(0)}  is  LAD  with 

N  i  =  l  m  1  -N  N 

respect  to  9.  Then,  for  each  given  9=0, 

(7.2)  AN1(V  -  9 

1  n  probabi  1  i  ty  as  N  -*  <*> 


Proof .  Fix  9.  By  Theorem  6.1,  given  9  =  9, 


(7.3) 


UN  -  V8)  -  ° 


in  probability  as  N  It  is  an  elementary  lemma  of  probability  theory  that 

X^,  -*  X  in  probability  as  N  -»  00  if  and  only  if  each  subsequence 

contains  a  further  subsequence  XN(j(k))  “*  X  with  probability  one  as  k  -*  ». 

Thus  to  prove  the  theorem,  it  suffices  to  select  an  arbitrary  subsequence 

{ N ( j ) >  and  then  prove  there  exists  a  further  subsequence  N { j ( k ) )  such  that 


with  probability  one  as  k 


AN  (j<k))(UN(j(k)))  -  ® 

♦  ®.  Choose  <N(j)).  Then  (7.3)  implies  that 


UN(j)  -  AN( j)le) 

in  probability  as  j  -♦  00  Then,  using  the  above  mentioned  lemma,  there  exists  a 
further  subsequence  N { j ( k ) )  for  which 


(7.4) 


UN(j(k))  '  AN(j(k))(8)  "  ° 


with  probability  one.  By  (7.1)  and  the  definition  of  the  inverse,  for  all 

-  A„ ( fl )  sufficiently  small  in  magnitude  and  satisfying  >  inf  A  ( e  )  , 
i  N  d  9  N 

there  exists  K  <  ®  such  that 

6 


(7.5) 


Ian  (»2)  -  0|  *  K0  I u2  -  An(.)|  for  all  9. 


Fix  a  typical  point  in  the  probability  space.  Now,  it  may  be  that  for  some 
arbitrarily  large  k 


(7.6) 


UN(  j  ( k ) )  5  infe  AN(j(k))(<?)' 


By  (7.4)  and  LAD.  there  exists  i  0  such  that  for  all  large  k 

UN(j(k))  >  AN(j(k))(®  "  V' 

Thus  L’N(  j  <  k )  )  >  inf0AN^  j  ^  k  j  j  ( 0 )  for  all  sufficiently  large  k  using  LAD. 


Hence  (7.6)  cannot  hold  for  arbitrarily  large  k. 


Thus,  combining  (7.5)  with  (7.4), 


|AN(j(k))tUN(j(k)))  '  91  ~  K6|UN(j(k))  "  AN(j(k))(9)|  “*  ° 
with  probability  one.  as  required.  □ 


Remarks .  (1)  Theorem  7.1  provides  a  large  class  of  sequences  of  estimators  of 

0.  including  (A./tU  >},  based  on  linear  formula  scoring.  In  practice,  one 
N  N 

needs  to  compute  A..(0)  and  its  inverse  *(0)  to  make  use  of  one  of  these 

N  N 

est  imators . 


(2)  It  is  to  be  noted  that  Holland,  Junker,  and  Thayer  (1987)  have 

DroDosed  using  <A..l(U..I)  to  estimate  the  distribution  of  9  and  have  proved  a 
N  N 

convergence  in  distribution  result  to  justify  this.  Their  motivation  for 
suggesting  {A  *(U  ))  is  different  from  ours. 

(3)  It  is  elementary  to  show  that  (7.2)  holding  for  all  9  implies 

(7.5)  AN1(V  **  ® 

in  probability  as  N  -♦  <*>.  Given  the  IRT  context,  (7.2)  is  perhaps  a  more 

interesting  formulation  than  (7.5).  It  does  of  course  follow  from  (7.5)  that 

A  1 ( U  )  can  be  used  as  a  method  of  estimating  the  distribution  of  9. 

N  N 

(4)  Note  that  (7.2)  states  that  convergence  in  probability  to  individual 
ability  holds  regardless  of  which  of  a  large  class  of  estimators  is  used.  That 
is,  convergence  in  probability  to  individual  ability  holds  for  every  nonsparse 
admissible  sequence  of  linear  formula  scores. 


8.  Discussion  and  Summary  of  Results.  The  purpose  of  the  paper  is  to  argue 
that  a  successful  approach  to  certain  fundamental  test  measurement  topics  such 
as  bias  requires  multidimensional  latent  modeling.  The  paper  provides  a  new 
conceptualization  of  latent  dimensionality,  essential  dimensionality.  This 
conceptualization  depends  on  the  replacement  of  local  independence  by  the  weaker 


and,  in  our  opinion,  psychometrically  more  appropriate  notion  of  essential 


independence .  Essential  dimensionality,  designed  to  mesh  with  the  empirical 


reality  of  multiply  determined  items,  attempts  to  count  only  the  dominant 
dimensions.  Theorems  2.2  and  2.3  present  conditions  that  guarantee  that 
essential  unidimensionality  holds. 


In  Section  3,  essential  unidimensionality  is  shown  in  Theorems  3.2  and  3.3 
to  characterize  the  consistent  estimation  of  a  unidimensional  latent  trait. 

Here  the  consistent  estimation  of  e"  is  defined  precisely  in  Definition  3.2. 


In  order  to  facilitate  this,  the  concepts  of  marginal  item  response  functions 
and  intrinsic  ability  scale  are  presented.  This  theory  is  applicable  to 
single-test  applications  and  does  not  require  that  the  IRFs  be  known  (i.e., 


ca 1 ibrated ) . 

Theorem  3.3  shows  that  essential  unidimensionality  guarantees,  under  the 
mild  regularity  condition  of  local  asymptotic  discrimination  of  {UN>,  that  the 
latent  ability  is  unique  up  to  monotone  transformations.  That  is,  essential 
unidimensionality,  an  empirically  testable  condition,  guarantees  that  the  latent 
trait  of  the  model  is  "well  defined"  in  the  sense  that  it  is  completely 


specified,  up  to  a  monotone  transformation.  Loosely  stated,  the  "data" 

(distributions  of  U  „'s)  determine  the  latent  trait  when  essential 
-N 

unidimensionality  holds.  The  above  results,  as  with  most  of  the  results  of  the 


paper,  requires  an  infinite  item  pool  formulation.  It  is  the  author's  position 
that  such  a  formulation  facilitates  the  study  of  many  foundational  IRT  issues. 

Example  3.3  shows  that  the  concept  of  essential  dE  dimenional ity  for  some 
d_  <  ®  is  not  vacuous  by  showing  that  test  sequences  exist  that  could  be 
intuitively  described  as  essentially  "-dimensional.  It  is  pointed  out  that 


1  * 


SS? 
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essential  unidimensionality.  when  other  (minor)  dimensions  besides  the  ability 
of  interest  9  are  present,  does  not  rule  out  non-random  pre-asymptot ic  bias  in 


the  estimation  of  9  for  short  tests. 

In  Section  4.  the  uncritical  acceptance  of  the  centrality  of  the  role  of 
item  parameter  invariance  is  challenged.  In  particular,  Example  4.1  shows  that 
invariance  (precisely,  unidimensional  invariance)  can  hold  and  yet  pervasive 
bias  against  a  particular  group  still  exist.  It  is  shown  that  unidimensional 
invariance  is  equivalent  to  traditional  unidimensionality  holding.  Essential 
d-dimensional  invariance  is  defined  by  replacing  local  independence  by  essential 
independence  in  the  definition  of  invariance.  Then  essential  unidimensional 
invariance  is  shown  to  be  equivalent  to  essential  unidimensionality.  These 
results  motivate  a  simple  latent  trait  based  definition  of  validity,  namely  that 
validity  holds  if  (i)  essential  unidimensionality  holds  and  (ii)  the  (unique  by 
Theorem  3.4)  essential  trait  9  is  the  ability  intended  to  be  measured. 

Section  5  addresses  the  issue  of  test  bias  in  a  single-test  two  group 
setting  from  the  viewpoint  of  consistent  estimation.  Essential  unidimensional¬ 
ity  is  shown  to  guarantee  consistent  estimation  of  ability  in  both  groups.  Thus 
the  issue  of  test  bias  can  be  analyzed  assuming  that  essential  d£  dimensionality 

holds  for  some  d„  >  1 .  The  test  bias  problem  is  then  characterized  as  the 
E 

estimation  of  the  intended  to  be  measured  9  for  a  two-group  latent  model  with 
essential  d^  dimensional  latent  ability  (9,  9 .  where  the  "nuisance"  ability 

9 ^  is  d  -  1  dimensional.  Test  bias  for  Groups  B  and  C  is  defined  as  the 

B  B  C  C 

conditional  distributions  of  ©,,  given  ©  =9  and  ©2  given  ©  =9 

differing  for  at  least  one  value  of  9.  It  is  stressed  that  the  marginal 

B  C 

distributions  of  ©  and  ©  play  no  role  in  the  definition.  Example  5.1 


\cm 


demonstrates  the  role  that  test  bias  as  herein  defined  plays  in  the  attempts  to 


estimate  ©  in  Groups  B  and  C.  Finally.  Theorem  5.3  shows,  recalling  that 


_B 

d  >  1  assumed,  that  the  lack  of  test  bias  implies  that  1!  and  U  are 
E  N  N 

equally  inconsistent  in  their  attempts  to  estimate  8;  that  is,  neither 

population  B  or  C  is  favored  over  the  other. 

Section  6  demonstrates  that  the  theory  of  Sections  1-5  need  not  be 

presented  only  in  the  context  of  the  behavior  of  0^  .  but  that  admissible 

linear  formula  scoring  can  be  used  as  a  basis  for  Sections  2-5  with  only  minor 

alterations  required.  This  development  is  largely  left  to  the  reader.  Here  an 

admissible  linear  formula  score  ,  a.  „  U. ,  N  >  lj  is  one  where  0  <  a.  .  < 

|/.i  =  l  l.N  l  |  l.N 

K  N  for  all  i  and  some  fixed  K  <  “.  It  is  noted  that  most  linear  formula 
scores  of  interest  are  admissible. 

Section  7  addresses  the  problem  of  estimating  8  on  the  0  scale  in  the 
multiple-test  multiple-population  problem  with  known  IRFs  assumed.  Theorem  7.1 
establishes  that  8  can  be  consistentely  estimated  on  the  8  scale  by  a  large 
class  of  sequences  of  estimators  in  the  sense  that  for  each  such  sequence, 


<v  - 


in  probability  as  N  -*  <*>.  Each  such  sequence  is  computable,  has  an  explicit 
formula,  and  is  based  on  an  admissible  linear  formula  score  in  an  intuitive 
natural  way. 
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